Skip to content

docs: add MLflow integration documentation to fine-tuning examples#1

Open
briangallagher wants to merge 118 commits into
mainfrom
add-mlflow-integration
Open

docs: add MLflow integration documentation to fine-tuning examples#1
briangallagher wants to merge 118 commits into
mainfrom
add-mlflow-integration

Conversation

@briangallagher

Copy link
Copy Markdown
Owner

Summary

  • Adds an MLflow Integration section to the READMEs for lora, osft, and sft fine-tuning examples
  • Documents how to enable MLflow on the cluster, what the notebook cells do, and how to navigate to the Experiments page in the RHOAI dashboard
  • Includes screenshots showing the Launch MLflow link and the left-nav path (Develop & train > Experiments)
  • Clarifies that MLflow tracking is available for interactive (single-node) notebooks only — distributed training does not currently support it

The interactive notebooks already contain the MLFLOW_EXPERIMENT_NAME environment variable cells; this PR adds the missing README documentation so users know how to take advantage of the feature.

Test plan

  • Verified MLflow operator can be enabled on RHOAI 3.4 cluster (DataScienceCluster patched to Managed)
  • Verified MLflow instance deploys successfully (v3.12.0, Available=True)
  • Confirmed the Experiments page renders in the RHOAI dashboard with Launch MLflow link
  • Screenshots captured from live cluster
  • Run through an interactive notebook to confirm metrics appear in MLflow (manual test)

Made with Cursor

kapil27 and others added 30 commits January 12, 2026 07:49
…anite/granite-3.3-2b-instruct` model on the Alpaca dataset. Update training function and arguments for better demonstration, and improve markdown documentation for clarity.
Fixed MD032 errors by ensuring lists are surrounded by blank lines.
Signed-off-by: Brian Gallagher <briangal@gmail.com>
Signed-off-by: Brian Gallagher <briangal@gmail.com>
Signed-off-by: Brian Gallagher <briangal@gmail.com>
….2-finetuning-examples

feat: add rhoai 3.2 fine tuning examples
…validation

- Fix PVC mount path: notebook mounts shared PVC at /opt/app-root/src/shared
- Add NOTEBOOK_SHARED_PATH and TRAINING_POD_PATH for correct path handling
- Use PreTrainedTokenizerFast to bypass AutoTokenizer hub validation
- Load model config explicitly to avoid hub validation issues
- Set HF_HUB_OFFLINE and TRANSFORMERS_OFFLINE env vars
- Add detailed model description (Qwen 2.5 1.5B Instruct)
- Add dataset documentation (Stanford Alpaca format and structure)
- Add training configuration tables with parameter explanations
- Add progress tracking architecture explanation
- Add PVC mounting and checkpoint structure documentation
- Add summary with what you accomplished and next steps
- Add quick reference tables for TransformersTrainer parameters
- Add comprehensive README with setup instructions
- Add images for workbench setup walkthrough
- Include model/dataset documentation
- Add validation configuration
- Add TransformersTrainer quick reference
- Fix syntax error: add missing if statement for final_path check
- Remove trailing whitespace from blank lines
- Remove unnecessary 'r' mode argument from open() calls
…nctionality

- Replace "Monitor Training Progress" section with "Follow Job Logs" for better context
- Stream job logs in real-time during training
- Add job status retrieval and detailed progress metrics display
- Improve error handling and namespace retrieval logic
- Clean up unnecessary imports and comments
… and clarity

- Update PVC mount paths to reflect SDK's automatic mounting at /mnt/kubeflow-checkpoints
- Clarify comments regarding model/data paths and checkpointing
- Improve documentation on checkpoint configuration and training arguments
- Ensure consistent output messages for checkpoints in both notebook and training pods
…pynb

Co-authored-by: Rob Bell <robell@redhat.com>
…-pre-commit

Fix: Test case for no stored outputs and consistent versions for code-quality
* doc-updates-model-serve-1

* doc-update minor fix module >step

* doc-update fix typos

* doc-updates change prereq to previous modules

* doc-update markdown in notebooks

* doc-update SME review comments
szaher and others added 17 commits May 7, 2026 10:53
Signed-off-by: Saad Zaher <szaher@redhat.com>
…ng-3.4GA-examples

Update for 3.4 and refactor to simplify
* Feature: AutoML time series forecasting tutorial (electricity sample) (red-hat-data-services#62)

* Replace main branch with rhoai-3.4 in autorag .md files

* docs: Updated Workbech section and documentation links (red-hat-data-services#69)

Assisted By Cursor

Signed-off-by: Dorota Laczak <dlaczak@redhat.com>

* automl time-series tutorial draft

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor

* Update time_series_forecasting_tutorial.md

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor

* promo known_covariates_names example added

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor

* replace the git urls from autox to main

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor

# Conflicts:
#	examples/autorag/readme.md

* clean up the dev preview status mentions

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor

* adding time-series pipeline to readme

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor

* Change branch reference from main -> rhoai-3.4.

Signed-off-by: Dorota Laczak <dlaczak@redhat.com>

* Improved formatting, and notebook section

Assisted by Cursor

Signed-off-by: Dorota Laczak <dlaczak@redhat.com>

* docs: Updated Workbech section and documentation links

Assisted By Cursor

Signed-off-by: Dorota Laczak <dlaczak@redhat.com>

* chore: AutoML examples update how to get artifacts

Assisted by Cursor

Signed-off-by: Dorota Laczak <dlaczak@redhat.com>

* chore: Updated Model registry anbd deployment steps

Signed-off-by: Dorota Laczak <dlaczak@redhat.com>

---------

Signed-off-by: Dorota Laczak <dlaczak@redhat.com>
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Co-authored-by: Michal Steczko <msteczko@redhat.com>
Co-authored-by: Dorota Laczak <dlaczak@redhat.com>

* updated images

* updated image

Signed-off-by: ZabinskiMichal <mzabinsk@redhat.com>

* updated images

* updated pipeline (red-hat-data-services#72)

* Custom column names in TS scenario tutorial update (red-hat-data-services#73)

* updated tutorial

* deleted unnecessary instrucion part

* last small change

* Update documentation with the latest UI changes (red-hat-data-services#75)

Signed-off-by: MichalSteczko <msteczko@redhat.com>

* chore: Removed pipeline.yaml file for AutoRAG (red-hat-data-services#79)

Assisted by Cursor

Signed-off-by: Dorota Laczak <dlaczak@redhat.com>

* docs(AutoML): Updated AutoML tutorials with UI path (red-hat-data-services#77)

updatedTabular and TimeSeries tutorials to new UI flow

* chore: Fixed for issues found by Markdownlinter

Assisted by Claude Code

Signed-off-by: Dorota Laczak <dlaczak@redhat.com>

* chore: Fixed issues found by CodeRabbit in PR review.

Assisted by Claude Code

Signed-off-by: Dorota Laczak <dlaczak@redhat.com>

* chore: Updated KServe AutoGluon Server repo link

Signed-off-by: Dorota Laczak <dlaczak@redhat.com>

---------

Signed-off-by: Dorota Laczak <dlaczak@redhat.com>
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Signed-off-by: ZabinskiMichal <mzabinsk@redhat.com>
Signed-off-by: MichalSteczko <msteczko@redhat.com>
Co-authored-by: Lukasz Cmielowski <lcmielow@redhat.com>
Co-authored-by: Michal Steczko <msteczko@redhat.com>
Co-authored-by: ZabinskiMichal <mzabinsk@redhat.com>
Co-authored-by: Michał Żabiński <85452231+ZabinskiMichal@users.noreply.github.com>
Signed-off-by: Saad Zaher <szaher@redhat.com>
@briangallagher briangallagher force-pushed the add-mlflow-integration branch 2 times, most recently from 8b53222 to 704462c Compare May 21, 2026 09:31
szaher and others added 7 commits May 21, 2026 10:36
Signed-off-by: Saad Zaher <szaher@redhat.com>
Signed-off-by: Saad Zaher <szaher@redhat.com>
…s/ray-rag-pipeline

add rag example using Ray Data, kfp, docling
Co-authored-by: Cursor <cursoragent@cursor.com>
…59077

RHOAIENG-59077: Add Ray Data & Docling RAG example
@briangallagher briangallagher force-pushed the add-mlflow-integration branch 2 times, most recently from e34a451 to 5b069d1 Compare June 9, 2026 07:04
- Add shared MLflow guide (examples/fine-tuning/mlflow.md) covering
  enabling the operator, creating the CR, and viewing experiments
- Link to the shared guide from lora, osft, and sft READMEs
- Add screenshots showing the Experiments page and run metrics
- Note that the KB article link requires Red Hat Customer Portal login

Co-authored-by: Cursor <cursoragent@cursor.com>
@briangallagher briangallagher force-pushed the add-mlflow-integration branch from 55ebb20 to cd1c7ed Compare June 9, 2026 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.