Skip to content

Commit 5b069d1

Browse files
docs: address PR feedback - deduplicate MLflow content and update screenshots
- Extract shared MLflow documentation into examples/fine-tuning/mlflow.md - Replace duplicated content in lora/osft/sft READMEs with link to shared doc - Update screenshots: remove email from top-right, use generic "fine-tuning" experiment name instead of method-specific names Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 63f5546 commit 5b069d1

6 files changed

Lines changed: 68 additions & 189 deletions

File tree

28.1 KB
Loading
49 KB
Loading

examples/fine-tuning/lora/README.md

Lines changed: 1 addition & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -175,66 +175,4 @@ You can now proceed with the instructions from the notebook. Enjoy!
175175

176176
## MLflow Integration (Optional)
177177

178-
Training Hub supports [MLflow](https://mlflow.org/) for experiment tracking. When MLflow is enabled on your RHOAI cluster, training metrics (loss, learning rate, etc.) are automatically logged to MLflow experiments — no additional code changes required beyond setting the experiment name.
179-
180-
> [!NOTE]
181-
> MLflow integration is available for **interactive (single node)** notebooks only. Distributed training jobs do not currently support MLflow tracking.
182-
183-
### Enabling MLflow
184-
185-
The interactive notebook already includes a cell that sets the MLflow experiment name:
186-
187-
```python
188-
os.environ["MLFLOW_EXPERIMENT_NAME"] = "lora-training"
189-
```
190-
191-
For this to work, MLflow must be enabled as a component in your RHOAI installation. If MLflow is not enabled, the environment variable is simply ignored and training proceeds normally.
192-
193-
**To enable MLflow on your cluster:**
194-
195-
1. Enable the MLflow Operator component in your `DataScienceCluster` CR:
196-
197-
```bash
198-
oc patch datasciencecluster default-dsc \
199-
--type=merge \
200-
-p '{"spec":{"components":{"mlflowoperator":{"managementState":"Managed"}}}}'
201-
```
202-
203-
2. Create an `MLflow` CR to deploy the tracking server (example using SQLite and a PV for storage):
204-
205-
```bash
206-
oc apply -f - <<EOF
207-
apiVersion: mlflow.opendatahub.io/v1
208-
kind: MLflow
209-
metadata:
210-
name: mlflow
211-
spec:
212-
backendStoreUri: "sqlite:////mlflow/mlflow.db"
213-
defaultArtifactRoot: "file:///mlflow/artifacts"
214-
serveArtifacts: true
215-
storage:
216-
accessModes:
217-
- ReadWriteOnce
218-
resources:
219-
requests:
220-
storage: 10Gi
221-
EOF
222-
```
223-
224-
For full details, see the [Configuring MLflow in OpenShift AI](https://access.redhat.com/articles/7136121) Knowledgebase article.
225-
226-
### Viewing MLflow Experiments
227-
228-
Once training completes with MLflow enabled, you can browse your experiment runs:
229-
230-
1. In the OpenShift AI dashboard, navigate to **Develop & train → Experiments** from the left sidebar menu.
231-
2. Select the experiment name (e.g., `lora-training`) to view all runs.
232-
3. Each run contains logged metrics (training loss, learning rate), parameters, and artifacts.
233-
234-
You can also launch the full MLflow UI by clicking the **"Launch MLflow"** link in the top right of the Experiments page:
235-
236-
![](../images/mlflow-experiments.png)
237-
238-
Each run logs metrics including training loss, learning rate, samples per second, and more:
239-
240-
![](../images/mlflow-run-metrics.png)
178+
The interactive notebook supports optional MLflow experiment tracking. See the [MLflow Integration guide](../mlflow.md) for setup instructions and details.

examples/fine-tuning/mlflow.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# MLflow Integration (Optional)
2+
3+
Training Hub supports [MLflow](https://mlflow.org/) for experiment tracking. When MLflow is enabled on your RHOAI cluster, training metrics (loss, learning rate, etc.) are automatically logged to MLflow experiments — no additional code changes required beyond setting the experiment name.
4+
5+
> [!NOTE]
6+
> MLflow integration is available for **interactive (single node)** notebooks only. Distributed training jobs do not currently support MLflow tracking.
7+
8+
## Enabling MLflow
9+
10+
Each interactive notebook already includes a cell that sets the MLflow experiment name:
11+
12+
```python
13+
os.environ["MLFLOW_EXPERIMENT_NAME"] = "<your-experiment-name>"
14+
```
15+
16+
For this to work, MLflow must be enabled as a component in your RHOAI installation. If MLflow is not enabled, the environment variable is simply ignored and training proceeds normally.
17+
18+
**To enable MLflow on your cluster:**
19+
20+
1. Enable the MLflow Operator component in your `DataScienceCluster` CR:
21+
22+
```bash
23+
oc patch datasciencecluster default-dsc \
24+
--type=merge \
25+
-p '{"spec":{"components":{"mlflowoperator":{"managementState":"Managed"}}}}'
26+
```
27+
28+
2. Create an `MLflow` CR to deploy the tracking server (example using SQLite and a PV for storage):
29+
30+
```bash
31+
oc apply -f - <<EOF
32+
apiVersion: mlflow.opendatahub.io/v1
33+
kind: MLflow
34+
metadata:
35+
name: mlflow
36+
spec:
37+
backendStoreUri: "sqlite:////mlflow/mlflow.db"
38+
defaultArtifactRoot: "file:///mlflow/artifacts"
39+
serveArtifacts: true
40+
storage:
41+
accessModes:
42+
- ReadWriteOnce
43+
resources:
44+
requests:
45+
storage: 10Gi
46+
EOF
47+
```
48+
49+
For full details, see the [Configuring MLflow in OpenShift AI](https://access.redhat.com/articles/7136121) Knowledgebase article.
50+
51+
## Viewing MLflow Experiments
52+
53+
Once training completes with MLflow enabled, you can browse your experiment runs:
54+
55+
1. In the OpenShift AI dashboard, navigate to **Develop & train → Experiments** from the left sidebar menu.
56+
2. Select the experiment name to view all runs.
57+
3. Each run contains logged metrics (training loss, learning rate), parameters, and artifacts.
58+
59+
You can also launch the full MLflow UI by clicking the **"Launch MLflow"** link in the top right of the Experiments page:
60+
61+
![MLflow experiments page](./images/mlflow-experiments.png)
62+
63+
Each run logs metrics including training loss, learning rate, samples per second, and more:
64+
65+
![MLflow run metrics](./images/mlflow-run-metrics.png)

examples/fine-tuning/osft/README.md

Lines changed: 1 addition & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -210,66 +210,4 @@ You can now proceed with the instructions from the notebook. Enjoy!
210210

211211
## MLflow Integration (Optional)
212212

213-
Training Hub supports [MLflow](https://mlflow.org/) for experiment tracking. When MLflow is enabled on your RHOAI cluster, training metrics (loss, learning rate, etc.) are automatically logged to MLflow experiments — no additional code changes required beyond setting the experiment name.
214-
215-
> [!NOTE]
216-
> MLflow integration is available for **interactive (single node)** notebooks only. Distributed training jobs do not currently support MLflow tracking.
217-
218-
### Enabling MLflow
219-
220-
The interactive notebook already includes a cell that sets the MLflow experiment name:
221-
222-
```python
223-
os.environ["MLFLOW_EXPERIMENT_NAME"] = "osft-training"
224-
```
225-
226-
For this to work, MLflow must be enabled as a component in your RHOAI installation. If MLflow is not enabled, the environment variable is simply ignored and training proceeds normally.
227-
228-
**To enable MLflow on your cluster:**
229-
230-
1. Enable the MLflow Operator component in your `DataScienceCluster` CR:
231-
232-
```bash
233-
oc patch datasciencecluster default-dsc \
234-
--type=merge \
235-
-p '{"spec":{"components":{"mlflowoperator":{"managementState":"Managed"}}}}'
236-
```
237-
238-
2. Create an `MLflow` CR to deploy the tracking server (example using SQLite and a PV for storage):
239-
240-
```bash
241-
oc apply -f - <<EOF
242-
apiVersion: mlflow.opendatahub.io/v1
243-
kind: MLflow
244-
metadata:
245-
name: mlflow
246-
spec:
247-
backendStoreUri: "sqlite:////mlflow/mlflow.db"
248-
defaultArtifactRoot: "file:///mlflow/artifacts"
249-
serveArtifacts: true
250-
storage:
251-
accessModes:
252-
- ReadWriteOnce
253-
resources:
254-
requests:
255-
storage: 10Gi
256-
EOF
257-
```
258-
259-
For full details, see the [Configuring MLflow in OpenShift AI](https://access.redhat.com/articles/7136121) Knowledgebase article.
260-
261-
### Viewing MLflow Experiments
262-
263-
Once training completes with MLflow enabled, you can browse your experiment runs:
264-
265-
1. In the OpenShift AI dashboard, navigate to **Develop & train → Experiments** from the left sidebar menu.
266-
2. Select the experiment name (e.g., `osft-training`) to view all runs.
267-
3. Each run contains logged metrics (training loss, learning rate), parameters, and artifacts.
268-
269-
You can also launch the full MLflow UI by clicking the **"Launch MLflow"** link in the top right of the Experiments page:
270-
271-
![](../images/mlflow-experiments.png)
272-
273-
Each run logs metrics including training loss, learning rate, samples per second, and more:
274-
275-
![](../images/mlflow-run-metrics.png)
213+
The interactive notebook supports optional MLflow experiment tracking. See the [MLflow Integration guide](../mlflow.md) for setup instructions and details.

examples/fine-tuning/sft/README.md

Lines changed: 1 addition & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -157,66 +157,4 @@ You can now proceed with the instructions from the notebook. Enjoy!
157157

158158
## MLflow Integration (Optional)
159159

160-
Training Hub supports [MLflow](https://mlflow.org/) for experiment tracking. When MLflow is enabled on your RHOAI cluster, training metrics (loss, learning rate, etc.) are automatically logged to MLflow experiments — no additional code changes required beyond setting the experiment name.
161-
162-
> [!NOTE]
163-
> MLflow integration is available for **interactive (single node)** notebooks only. Distributed training jobs do not currently support MLflow tracking.
164-
165-
### Enabling MLflow
166-
167-
The interactive notebook already includes a cell that sets the MLflow experiment name:
168-
169-
```python
170-
os.environ["MLFLOW_EXPERIMENT_NAME"] = "sft-training"
171-
```
172-
173-
For this to work, MLflow must be enabled as a component in your RHOAI installation. If MLflow is not enabled, the environment variable is simply ignored and training proceeds normally.
174-
175-
**To enable MLflow on your cluster:**
176-
177-
1. Enable the MLflow Operator component in your `DataScienceCluster` CR:
178-
179-
```bash
180-
oc patch datasciencecluster default-dsc \
181-
--type=merge \
182-
-p '{"spec":{"components":{"mlflowoperator":{"managementState":"Managed"}}}}'
183-
```
184-
185-
2. Create an `MLflow` CR to deploy the tracking server (example using SQLite and a PV for storage):
186-
187-
```bash
188-
oc apply -f - <<EOF
189-
apiVersion: mlflow.opendatahub.io/v1
190-
kind: MLflow
191-
metadata:
192-
name: mlflow
193-
spec:
194-
backendStoreUri: "sqlite:////mlflow/mlflow.db"
195-
defaultArtifactRoot: "file:///mlflow/artifacts"
196-
serveArtifacts: true
197-
storage:
198-
accessModes:
199-
- ReadWriteOnce
200-
resources:
201-
requests:
202-
storage: 10Gi
203-
EOF
204-
```
205-
206-
For full details, see the [Configuring MLflow in OpenShift AI](https://access.redhat.com/articles/7136121) Knowledgebase article.
207-
208-
### Viewing MLflow Experiments
209-
210-
Once training completes with MLflow enabled, you can browse your experiment runs:
211-
212-
1. In the OpenShift AI dashboard, navigate to **Develop & train → Experiments** from the left sidebar menu.
213-
2. Select the experiment name (e.g., `sft-training`) to view all runs.
214-
3. Each run contains logged metrics (training loss, learning rate), parameters, and artifacts.
215-
216-
You can also launch the full MLflow UI by clicking the **"Launch MLflow"** link in the top right of the Experiments page:
217-
218-
![](../images/mlflow-experiments.png)
219-
220-
Each run logs metrics including training loss, learning rate, samples per second, and more:
221-
222-
![](../images/mlflow-run-metrics.png)
160+
The interactive notebook supports optional MLflow experiment tracking. See the [MLflow Integration guide](../mlflow.md) for setup instructions and details.

0 commit comments

Comments
 (0)