docs: Extend CLI basic usage examples to all supported CLIs #4425

behroozazarkhalili · 2025-11-02T20:01:26Z

Resolves #4378

Summary

Extends the CLI documentation to include basic usage examples for GRPO, RLOO, and KTO, achieving parity with the existing SFT, DPO, and Reward examples.

Changes

Added GRPO examples using trl-lib/ultrafeedback-prompt dataset
Added RLOO examples using AI-MO/NuminaMath-TIR dataset
Added KTO examples using trl-lib/kto-mix-14k dataset

All three CLIs now have examples in every documentation section:

Basic Usage
Configuration Files
Scaling with Accelerate (inline and config file variants)
Using --accelerate_config
Dataset Mixtures

Verification

✅ All 6 training CLIs (SFT, DPO, Reward, GRPO, RLOO, KTO) now have complete documentation coverage
✅ CLI commands verified to exist in trl/cli.py (lines 47-51)
✅ Datasets verified against official examples in examples/scripts/
✅ Markdown structure validated (all hfoptions blocks properly balanced)

Testing

Documentation structure validated using Python script to verify:

All hfoptions blocks are balanced
All 6 CLIs present in each section
Correct datasets used consistently

Resolves huggingface#4378 - Add GRPO CLI examples with trl-lib/ultrafeedback-prompt dataset - Add RLOO CLI examples with AI-MO/NuminaMath-TIR dataset - Add KTO CLI examples with trl-lib/kto-mix-14k dataset - Add examples to all sections: Basic Usage, Config Files, Accelerate, accelerate_config, and dataset mixtures - Ensure parity in documentation coverage across all 6 training CLIs - Verified CLI commands exist in trl/cli.py (lines 47-51) - Verified datasets match official examples in examples/scripts/

HuggingFaceDocBuilderDev · 2025-11-02T20:04:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sergiopaniego

We could simplify the <hfoptions> by giving all of them the same id, so once you click on your preferred trainer, all of the options switch to it.

For the ... inline - ... w/ config file cases, we can expand these sections, concatenating both of them and only using the trainer name, so in the options we'd only use the trainer names.

Addresses review comments on PR #4425: 1. Unified all <hfoptions> IDs to 'trainers' (5 sections) - Enables persistent trainer selection across all documentation sections - Improved user experience when navigating between examples 2. Consolidated inline/config file options (12 pairs → 12 single options) - Merged separate "inline" and "w/ config file" options - Used "Or with config file:" pattern for clarity - Applied to Sections 3 (Scaling) and 4 (Accelerate Config) Result: 30 balanced options (6 trainers × 5 sections) with improved navigation

sergiopaniego reviewed Nov 3, 2025

View reviewed changes

behroozazarkhalili added 2 commits November 3, 2025 09:56

Merge branch 'main' into docs/extend-cli-examples

25ff01d

Merge branch 'main' into docs/extend-cli-examples

d2a419f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Extend CLI basic usage examples to all supported CLIs #4425

docs: Extend CLI basic usage examples to all supported CLIs #4425

Uh oh!

behroozazarkhalili commented Nov 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 2, 2025

Uh oh!

sergiopaniego left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

docs: Extend CLI basic usage examples to all supported CLIs #4425

Are you sure you want to change the base?

docs: Extend CLI basic usage examples to all supported CLIs #4425

Uh oh!

Conversation

behroozazarkhalili commented Nov 2, 2025

Summary

Changes

Verification

Testing

Uh oh!

HuggingFaceDocBuilderDev commented Nov 2, 2025

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants