Skip to content

Conversation

@behroozazarkhalili
Copy link
Collaborator

Resolves #4378

Summary

Extends the CLI documentation to include basic usage examples for GRPO, RLOO, and KTO, achieving parity with the existing SFT, DPO, and Reward examples.

Changes

  • Added GRPO examples using trl-lib/ultrafeedback-prompt dataset
  • Added RLOO examples using AI-MO/NuminaMath-TIR dataset
  • Added KTO examples using trl-lib/kto-mix-14k dataset

All three CLIs now have examples in every documentation section:

  • Basic Usage
  • Configuration Files
  • Scaling with Accelerate (inline and config file variants)
  • Using --accelerate_config
  • Dataset Mixtures

Verification

  • ✅ All 6 training CLIs (SFT, DPO, Reward, GRPO, RLOO, KTO) now have complete documentation coverage
  • ✅ CLI commands verified to exist in trl/cli.py (lines 47-51)
  • ✅ Datasets verified against official examples in examples/scripts/
  • ✅ Markdown structure validated (all hfoptions blocks properly balanced)

Testing

Documentation structure validated using Python script to verify:

  • All hfoptions blocks are balanced
  • All 6 CLIs present in each section
  • Correct datasets used consistently

Resolves huggingface#4378

- Add GRPO CLI examples with trl-lib/ultrafeedback-prompt dataset
- Add RLOO CLI examples with AI-MO/NuminaMath-TIR dataset
- Add KTO CLI examples with trl-lib/kto-mix-14k dataset
- Add examples to all sections: Basic Usage, Config Files, Accelerate, accelerate_config, and dataset mixtures
- Ensure parity in documentation coverage across all 6 training CLIs
- Verified CLI commands exist in trl/cli.py (lines 47-51)
- Verified datasets match official examples in examples/scripts/
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could simplify the <hfoptions> by giving all of them the same id, so once you click on your preferred trainer, all of the options switch to it.

For the ... inline - ... w/ config file cases, we can expand these sections, concatenating both of them and only using the trainer name, so in the options we'd only use the trainer names.

behroozazarkhalili added a commit that referenced this pull request Nov 3, 2025
Addresses review comments on PR #4425:

1. Unified all <hfoptions> IDs to 'trainers' (5 sections)
   - Enables persistent trainer selection across all documentation sections
   - Improved user experience when navigating between examples

2. Consolidated inline/config file options (12 pairs → 12 single options)
   - Merged separate "inline" and "w/ config file" options
   - Used "Or with config file:" pattern for clarity
   - Applied to Sections 3 (Scaling) and 4 (Accelerate Config)

Result: 30 balanced options (6 trainers × 5 sections) with improved navigation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend basic usage example to all supported CLIs

3 participants