Skip to content

Commit cd1c7ed

Browse files
docs: add MLflow integration documentation to fine-tuning examples
- Add shared MLflow guide (examples/fine-tuning/mlflow.md) covering enabling the operator, creating the CR, and viewing experiments - Link to the shared guide from lora, osft, and sft READMEs - Add screenshots showing the Experiments page and run metrics - Note that the KB article link requires Red Hat Customer Portal login Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 6f714cc commit cd1c7ed

6 files changed

Lines changed: 77 additions & 0 deletions

File tree

68.6 KB
Loading
103 KB
Loading

examples/fine-tuning/lora/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,3 +172,7 @@ to seamlessly run fine-tuning jobs.
172172
> You can skip the token if switching to non-gated models.
173173
174174
You can now proceed with the instructions from the notebook. Enjoy!
175+
176+
## MLflow Integration (Optional)
177+
178+
The interactive notebook supports optional MLflow experiment tracking. See the [MLflow Integration guide](../mlflow.md) for setup instructions and details.

examples/fine-tuning/mlflow.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# MLflow Integration (Optional)
2+
3+
Training Hub supports [MLflow](https://mlflow.org/) for experiment tracking. When MLflow is enabled on your RHOAI cluster, training metrics (loss, learning rate, etc.) are automatically logged to MLflow experiments — no additional code changes required beyond setting the experiment name.
4+
5+
> [!NOTE]
6+
> MLflow integration is available for **interactive (single node)** notebooks only. Distributed training jobs do not currently support MLflow tracking.
7+
8+
## Enabling MLflow
9+
10+
Each interactive notebook already includes a cell that sets the MLflow experiment name:
11+
12+
```python
13+
os.environ["MLFLOW_EXPERIMENT_NAME"] = "<your-experiment-name>"
14+
```
15+
16+
For this to work, MLflow must be enabled as a component in your RHOAI installation. If MLflow is not enabled, the environment variable is simply ignored and training proceeds normally.
17+
18+
**To enable MLflow on your cluster:**
19+
20+
1. Enable the MLflow Operator component in your `DataScienceCluster` CR:
21+
22+
```bash
23+
oc patch datasciencecluster default-dsc \
24+
--type=merge \
25+
-p '{"spec":{"components":{"mlflowoperator":{"managementState":"Managed"}}}}'
26+
```
27+
28+
2. Create an `MLflow` CR to deploy the tracking server (example using SQLite and a PV for storage):
29+
30+
```bash
31+
oc apply -f - <<EOF
32+
apiVersion: mlflow.opendatahub.io/v1
33+
kind: MLflow
34+
metadata:
35+
name: mlflow
36+
spec:
37+
backendStoreUri: "sqlite:////mlflow/mlflow.db"
38+
defaultArtifactRoot: "file:///mlflow/artifacts"
39+
serveArtifacts: true
40+
storage:
41+
accessModes:
42+
- ReadWriteOnce
43+
resources:
44+
requests:
45+
storage: 10Gi
46+
EOF
47+
```
48+
49+
For full details, see the [Configuring MLflow in OpenShift AI](https://access.redhat.com/articles/7136121) Knowledgebase article (requires Red Hat Customer Portal login).
50+
51+
## Viewing MLflow Experiments
52+
53+
Once training completes with MLflow enabled, you can browse your experiment runs:
54+
55+
1. In the OpenShift AI dashboard, navigate to **Develop & train → Experiments** from the left sidebar menu.
56+
2. Select the experiment name to view all runs.
57+
3. Each run contains logged metrics (training loss, learning rate), parameters, and artifacts.
58+
59+
You can also launch the full MLflow UI by clicking the **"Launch MLflow"** link in the top right of the Experiments page:
60+
61+
![MLflow experiments page](./images/mlflow-experiments.png)
62+
63+
Each run logs metrics including training loss, learning rate, samples per second, and more:
64+
65+
![MLflow run metrics](./images/mlflow-run-metrics.png)

examples/fine-tuning/osft/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,3 +207,7 @@ These images serve both as training runtime and jupyter notebook images and come
207207
> You can skip the token if switching to non-gated models.
208208
209209
You can now proceed with the instructions from the notebook. Enjoy!
210+
211+
## MLflow Integration (Optional)
212+
213+
The interactive notebook supports optional MLflow experiment tracking. See the [MLflow Integration guide](../mlflow.md) for setup instructions and details.

examples/fine-tuning/sft/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,3 +154,7 @@ to seamlessly run fine-tuning jobs.
154154
> You can skip the token if switching to non-gated models.
155155
156156
You can now proceed with the instructions from the notebook. Enjoy!
157+
158+
## MLflow Integration (Optional)
159+
160+
The interactive notebook supports optional MLflow experiment tracking. See the [MLflow Integration guide](../mlflow.md) for setup instructions and details.

0 commit comments

Comments
 (0)