docs: add MLflow integration documentation to fine-tuning examples by briangallagher · Pull Request #1 · briangallagher/red-hat-ai-examples

briangallagher · 2026-05-20T13:22:52Z

Summary

Adds an MLflow Integration section to the READMEs for lora, osft, and sft fine-tuning examples
Documents how to enable MLflow on the cluster, what the notebook cells do, and how to navigate to the Experiments page in the RHOAI dashboard
Includes screenshots showing the Launch MLflow link and the left-nav path (Develop & train > Experiments)
Clarifies that MLflow tracking is available for interactive (single-node) notebooks only — distributed training does not currently support it

The interactive notebooks already contain the MLFLOW_EXPERIMENT_NAME environment variable cells; this PR adds the missing README documentation so users know how to take advantage of the feature.

Test plan

Verified MLflow operator can be enabled on RHOAI 3.4 cluster (DataScienceCluster patched to Managed)
Verified MLflow instance deploys successfully (v3.12.0, Available=True)
Confirmed the Experiments page renders in the RHOAI dashboard with Launch MLflow link
Screenshots captured from live cluster
Run through an interactive notebook to confirm metrics appear in MLflow (manual test)

Made with Cursor

… Hat OpenShift AI

…Kubeflow Trainer

…anite/granite-3.3-2b-instruct` model on the Alpaca dataset. Update training function and arguments for better demonstration, and improve markdown documentation for clarity.

Fixed MD032 errors by ensuring lists are surrounded by blank lines.

Signed-off-by: Brian Gallagher <briangal@gmail.com>

….2-finetuning-examples feat: add rhoai 3.2 fine tuning examples

…validation - Fix PVC mount path: notebook mounts shared PVC at /opt/app-root/src/shared - Add NOTEBOOK_SHARED_PATH and TRAINING_POD_PATH for correct path handling - Use PreTrainedTokenizerFast to bypass AutoTokenizer hub validation - Load model config explicitly to avoid hub validation issues - Set HF_HUB_OFFLINE and TRANSFORMERS_OFFLINE env vars

- Add detailed model description (Qwen 2.5 1.5B Instruct) - Add dataset documentation (Stanford Alpaca format and structure) - Add training configuration tables with parameter explanations - Add progress tracking architecture explanation - Add PVC mounting and checkpoint structure documentation - Add summary with what you accomplished and next steps - Add quick reference tables for TransformersTrainer parameters

…cking

- Add comprehensive README with setup instructions - Add images for workbench setup walkthrough - Include model/dataset documentation - Add validation configuration - Add TransformersTrainer quick reference

- Fix syntax error: add missing if statement for final_path check - Remove trailing whitespace from blank lines - Remove unnecessary 'r' mode argument from open() calls

… clarity

…nctionality - Replace "Monitor Training Progress" section with "Follow Job Logs" for better context - Stream job logs in real-time during training - Add job status retrieval and detailed progress metrics display - Improve error handling and namespace retrieval logic - Clean up unnecessary imports and comments

… and clarity - Update PVC mount paths to reflect SDK's automatic mounting at /mnt/kubeflow-checkpoints - Clarify comments regarding model/data paths and checkpointing - Improve documentation on checkpoint configuration and training arguments - Ensure consistent output messages for checkpoints in both notebook and training pods

…pynb Co-authored-by: Rob Bell <robell@redhat.com>

…-pre-commit Fix: Test case for no stored outputs and consistent versions for code-quality

* doc-updates-model-serve-1 * doc-update minor fix module >step * doc-update fix typos * doc-updates change prereq to previous modules * doc-update markdown in notebooks * doc-update SME review comments

…e/model_serve_1

Signed-off-by: Saad Zaher <szaher@redhat.com>

…ng-3.4GA-examples Update for 3.4 and refactor to simplify

* Feature: AutoML time series forecasting tutorial (electricity sample) (red-hat-data-services#62) * Replace main branch with rhoai-3.4 in autorag .md files * docs: Updated Workbech section and documentation links (red-hat-data-services#69) Assisted By Cursor Signed-off-by: Dorota Laczak <dlaczak@redhat.com> * automl time-series tutorial draft Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor * Update time_series_forecasting_tutorial.md Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor * promo known_covariates_names example added Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor * replace the git urls from autox to main Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor # Conflicts: # examples/autorag/readme.md * clean up the dev preview status mentions Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor * adding time-series pipeline to readme Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor * Change branch reference from main -> rhoai-3.4. Signed-off-by: Dorota Laczak <dlaczak@redhat.com> * Improved formatting, and notebook section Assisted by Cursor Signed-off-by: Dorota Laczak <dlaczak@redhat.com> * docs: Updated Workbech section and documentation links Assisted By Cursor Signed-off-by: Dorota Laczak <dlaczak@redhat.com> * chore: AutoML examples update how to get artifacts Assisted by Cursor Signed-off-by: Dorota Laczak <dlaczak@redhat.com> * chore: Updated Model registry anbd deployment steps Signed-off-by: Dorota Laczak <dlaczak@redhat.com> --------- Signed-off-by: Dorota Laczak <dlaczak@redhat.com> Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Co-authored-by: Michal Steczko <msteczko@redhat.com> Co-authored-by: Dorota Laczak <dlaczak@redhat.com> * updated images * updated image Signed-off-by: ZabinskiMichal <mzabinsk@redhat.com> * updated images * updated pipeline (red-hat-data-services#72) * Custom column names in TS scenario tutorial update (red-hat-data-services#73) * updated tutorial * deleted unnecessary instrucion part * last small change * Update documentation with the latest UI changes (red-hat-data-services#75) Signed-off-by: MichalSteczko <msteczko@redhat.com> * chore: Removed pipeline.yaml file for AutoRAG (red-hat-data-services#79) Assisted by Cursor Signed-off-by: Dorota Laczak <dlaczak@redhat.com> * docs(AutoML): Updated AutoML tutorials with UI path (red-hat-data-services#77) updatedTabular and TimeSeries tutorials to new UI flow * chore: Fixed for issues found by Markdownlinter Assisted by Claude Code Signed-off-by: Dorota Laczak <dlaczak@redhat.com> * chore: Fixed issues found by CodeRabbit in PR review. Assisted by Claude Code Signed-off-by: Dorota Laczak <dlaczak@redhat.com> * chore: Updated KServe AutoGluon Server repo link Signed-off-by: Dorota Laczak <dlaczak@redhat.com> --------- Signed-off-by: Dorota Laczak <dlaczak@redhat.com> Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Signed-off-by: ZabinskiMichal <mzabinsk@redhat.com> Signed-off-by: MichalSteczko <msteczko@redhat.com> Co-authored-by: Lukasz Cmielowski <lcmielow@redhat.com> Co-authored-by: Michal Steczko <msteczko@redhat.com> Co-authored-by: ZabinskiMichal <mzabinsk@redhat.com> Co-authored-by: Michał Żabiński <85452231+ZabinskiMichal@users.noreply.github.com>

Signed-off-by: Saad Zaher <szaher@redhat.com>

…ed-hat-ai-examples into ray-rag-pipeline

Signed-off-by: Saad Zaher <szaher@redhat.com>

…s/ray-rag-pipeline add rag example using Ray Data, kfp, docling

Co-authored-by: Cursor <cursoragent@cursor.com>

…59077 RHOAIENG-59077: Add Ray Data & Docling RAG example

- Add shared MLflow guide (examples/fine-tuning/mlflow.md) covering enabling the operator, creating the CR, and viewing experiments - Link to the shared guide from lora, osft, and sft READMEs - Add screenshots showing the Experiments page and run metrics - Note that the KB article link requires Red Hat Customer Portal login Co-authored-by: Cursor <cursoragent@cursor.com>

kapil27 and others added 30 commits January 12, 2026 07:49

Add example for distributed training using TransformersTrainer on Red…

ae38da6

… Hat OpenShift AI

Add example notebook and README for real-time progress tracking with …

c0c1b8d

…Kubeflow Trainer

Improved markdown formatting.

6dac08d

Enhance progress tracking example notebook by fine-tuning the `ibm-gr…

9e6f7f3

…anite/granite-3.3-2b-instruct` model on the Alpaca dataset. Update training function and arguments for better demonstration, and improve markdown documentation for clarity.

fix: add blank lines around lists in README to satisfy markdownlint

195f458

Fixed MD032 errors by ensuring lists are surrounded by blank lines.

add rhoai 3.2 fine tuning examples

7cbcdce

Signed-off-by: Brian Gallagher <briangal@gmail.com>

address comments in review

58339a7

Signed-off-by: Brian Gallagher <briangal@gmail.com>

add RWX storage comment to rEADME

d51d314

Signed-off-by: Brian Gallagher <briangal@gmail.com>

Merge pull request red-hat-data-services#27 from briangallagher/add-3…

12861df

….2-finetuning-examples feat: add rhoai 3.2 fine tuning examples

Clear notebook cell outputs

c3963bb

Merge branch 'red-hat-data-services:main' into trainerv2-progress-tra…

ccd1be1

…cking

Add README and setup images for progress tracking example

4771817

- Add comprehensive README with setup instructions - Add images for workbench setup walkthrough - Include model/dataset documentation - Add validation configuration - Add TransformersTrainer quick reference

Fix ruff linting issues in progress tracking example

51290e4

- Fix syntax error: add missing if statement for final_path check - Remove trailing whitespace from blank lines - Remove unnecessary 'r' mode argument from open() calls

Update progress tracking example notebook to improve formatting

58e296d

Update README for progress tracking example to enhance formatting and…

70b7edb

… clarity

add initial osft minimal pipelines example

b40a1cc

Update examples/trainer/progress-tracking/progress-tracking-example.i…

15e0cc2

…pynb Co-authored-by: Rob Bell <robell@redhat.com>

Fix the versions for code-quality

9cc2cf9

Merge pull request red-hat-data-services#32 from tarun-etikala/update…

dab0c26

…-pre-commit Fix: Test case for no stored outputs and consistent versions for code-quality

update code and directory structure after review

dbea735

doc-updates-model-serve-1 (#1)

d639537

* doc-updates-model-serve-1 * doc-update minor fix module >step * doc-update fix typos * doc-updates change prereq to previous modules * doc-update markdown in notebooks * doc-update SME review comments

fix linting

cf0312e

fix linting

411d92c

fix file paths in readmes

bec7b54

further edits

c942df8

Merge pull request red-hat-data-services#29 from sanafayyaz315/featur…

bf52e66

…e/model_serve_1

szaher and others added 17 commits May 7, 2026 10:53

Merge branch 'main' into ray-rag-pipeline

140cefd

fix CI and update links

0716b5e

Signed-off-by: Saad Zaher <szaher@redhat.com>

refactor fine tuning examples

b9cd444

fix: sft example imports and setup fix

0e34ac0

further refactor to match 3.4

7b86a86

fix: failing lint test re imports in lora

809ab2b

fix: another linter issue

aec523a

docs: update readmes re hardware profiles following comments

cb029f0

docs: address coderabbit comments

7a97c0b

docs: address one more coderabbit comment

3d0f681

docs: update distributed notebooks based on comments

9830827

docs: updates to sft pvc path

52c27ab

docs: updates to absolute path following comments

21035cd

docs: revert changes to lora example

7127338

Merge pull request red-hat-data-services#64 from MStokluska/Fine-tuni…

68f3d30

…ng-3.4GA-examples Update for 3.4 and refactor to simplify

update pipeline parameters

990cb9e

Signed-off-by: Saad Zaher <szaher@redhat.com>

briangallagher force-pushed the add-mlflow-integration branch 2 times, most recently from 8b53222 to 704462c Compare May 21, 2026 09:31

szaher and others added 7 commits May 21, 2026 10:36

Merge branch 'main' into ray-rag-pipeline

ac00b94

add hf secret creation as part of the pipeline/readme

2e7efc1

Signed-off-by: Saad Zaher <szaher@redhat.com>

Merge branch 'ray-rag-pipeline' of github.com:red-hat-data-services/r…

7e653fa

…ed-hat-ai-examples into ray-rag-pipeline

add provider name before model name in ogx test

b127a21

Signed-off-by: Saad Zaher <szaher@redhat.com>

Merge pull request red-hat-data-services#78 from red-hat-data-service…

8124d3b

…s/ray-rag-pipeline add rag example using Ray Data, kfp, docling

RHOAIENG-59077: Add Ray Data & Docling RAG example

10a3f62

Co-authored-by: Cursor <cursoragent@cursor.com>

Merge pull request red-hat-data-services#85 from kryanbeane/RHOAIENG-…

6f714cc

…59077 RHOAIENG-59077: Add Ray Data & Docling RAG example

briangallagher force-pushed the add-mlflow-integration branch 2 times, most recently from e34a451 to 5b069d1 Compare June 9, 2026 07:04

briangallagher force-pushed the add-mlflow-integration branch from 55ebb20 to cd1c7ed Compare June 9, 2026 10:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add MLflow integration documentation to fine-tuning examples#1

docs: add MLflow integration documentation to fine-tuning examples#1
briangallagher wants to merge 118 commits into
mainfrom
add-mlflow-integration

briangallagher commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Conversation

briangallagher commented May 20, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants