Skip to content

[Feature] Add modular tracking interface with MLflow backend#1591

Open
mouad-hpc wants to merge 3 commits intoTHUDM:mainfrom
mouad-hpc:feat/mlflow-tracking
Open

[Feature] Add modular tracking interface with MLflow backend#1591
mouad-hpc wants to merge 3 commits intoTHUDM:mainfrom
mouad-hpc:feat/mlflow-tracking

Conversation

@mouad-hpc
Copy link

Summary

  • Adds TrackingBackend ABC and TrackingManager for pluggable logging backends
  • Adds MLflow as a new tracking backend alongside wandb and tensorboard
  • Refactors logging_utils.py to delegate to the manager instead of hardcoded conditionals
  • New CLI flags: --use-mlflow, --mlflow-tracking-uri, --mlflow-experiment-name, --mlflow-run-name
  • MLflow is an optional dependency (pip install slime[mlflow])

Files changed

  • slime/utils/tracking.py (new) - shared interface + adapters + registry
  • slime/utils/mlflow_utils.py (new) - MLflow backend implementation
  • slime/utils/logging_utils.py - uses TrackingManager internally, same public API
  • slime/utils/arguments.py - MLflow CLI flags
  • setup.py - mlflow added as optional extra
  • requirements.txt - mlflow removed from core deps

Test plan

  • Verified e2e functionality on 8xH100 GPU cluster
  • Full training run with --use-mlflow logging metrics end-to-end
  • Verify secondary rank attachment via mlflow_run_id
  • Verify no regression when mlflow is not installed

@mouad-hpc mouad-hpc marked this pull request as ready for review February 17, 2026 01:24
@mouad-hpc mouad-hpc changed the base branch from main to feature/ci February 17, 2026 01:25
@mouad-hpc
Copy link
Author

@zhuzilin For review

@mouad-hpc mouad-hpc changed the base branch from feature/ci to main February 17, 2026 01:28
@mouad-hpc mouad-hpc force-pushed the feat/mlflow-tracking branch from 6358178 to c539e1b Compare February 17, 2026 01:44
@mouad-hpc mouad-hpc marked this pull request as draft February 17, 2026 01:45
mouad-hpc

This comment was marked as spam.

@mouad-hpc mouad-hpc marked this pull request as ready for review February 17, 2026 01:49
@mouad-hpc mouad-hpc force-pushed the feat/mlflow-tracking branch from c539e1b to f4fcce8 Compare February 19, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant