-
Notifications
You must be signed in to change notification settings - Fork 581
Open
Description
Motivation
All existing run scripts under scripts/ require 4-8 GPUs (e.g., run-qwen2.5-0.5B-reproducibility.sh uses 8 GPUs, test scripts use 4 GPUs). This makes it difficult for new contributors to:
- Understand the slime training flow on a single development GPU
- Debug issues without reserving a multi-GPU node
- Quickly iterate when developing new features (e.g., adding model support)
Proposal
Add a minimal 1-GPU run script scripts/run-qwen3-0.6B-minimal.sh that:
- Uses Qwen3-0.6B (smallest supported Qwen3 model)
- Runs on a single GPU with
--tensor-model-parallel-size 1,--pipeline-model-parallel-size 1 - Uses small batch sizes (global batch size 16, rollout batch size 4) and short response lengths (512 tokens)
- Covers the full GRPO training loop (rollout + training + eval) on GSM8K
Existing alternatives
| Script | GPUs | Limitation |
|---|---|---|
scripts/run-qwen2.5-0.5B-reproducibility.sh |
8 | Designed for reproducibility, not minimal debugging |
examples/true_on_policy/run_simple.py |
1 (default) | Only covers true on-policy mode, not the standard training flow |
tests/test_qwen2.5_0.5B_debug_rollout_then_train.py |
8 | Test-oriented, not a standalone learning script |
None of these provide a simple 1-GPU script for the standard training flow.
Additional Context
I'm happy to submit a PR for this if the team thinks it would be useful. The script is a single file with no changes to existing code.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels