[Feature Request] Add minimal 1-GPU script for debugging and learning

### Motivation

All existing run scripts under `scripts/` require 4-8 GPUs (e.g., `run-qwen2.5-0.5B-reproducibility.sh` uses 8 GPUs, test scripts use 4 GPUs). This makes it difficult for new contributors to:

- Understand the slime training flow on a single development GPU
- Debug issues without reserving a multi-GPU node
- Quickly iterate when developing new features (e.g., adding model support)

### Proposal

Add a minimal 1-GPU run script `scripts/run-qwen3-0.6B-minimal.sh` that:

- Uses Qwen3-0.6B (smallest supported Qwen3 model)
- Runs on a single GPU with `--tensor-model-parallel-size 1`, `--pipeline-model-parallel-size 1`
- Uses small batch sizes (global batch size 16, rollout batch size 4) and short response lengths (512 tokens)
- Covers the full GRPO training loop (rollout + training + eval) on GSM8K

### Existing alternatives

| Script | GPUs | Limitation |
|---|---|---|
| `scripts/run-qwen2.5-0.5B-reproducibility.sh` | 8 | Designed for reproducibility, not minimal debugging |
| `examples/true_on_policy/run_simple.py` | 1 (default) | Only covers true on-policy mode, not the standard training flow |
| `tests/test_qwen2.5_0.5B_debug_rollout_then_train.py` | 8 | Test-oriented, not a standalone learning script |

None of these provide a simple 1-GPU script for the standard training flow.

### Additional Context

I'm happy to submit a PR for this if the team thinks it would be useful. The script is a single file with no changes to existing code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add minimal 1-GPU script for debugging and learning #1628

Motivation

Proposal

Existing alternatives

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Script	GPUs	Limitation
`scripts/run-qwen2.5-0.5B-reproducibility.sh`	8	Designed for reproducibility, not minimal debugging
`examples/true_on_policy/run_simple.py`	1 (default)	Only covers true on-policy mode, not the standard training flow
`tests/test_qwen2.5_0.5B_debug_rollout_then_train.py`	8	Test-oriented, not a standalone learning script

[Feature Request] Add minimal 1-GPU script for debugging and learning #1628

Description

Motivation

Proposal

Existing alternatives

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions