Skip to content

[Feature Request] Add minimal 1-GPU script for debugging and learning #1628

@ShanningZhuang

Description

@ShanningZhuang

Motivation

All existing run scripts under scripts/ require 4-8 GPUs (e.g., run-qwen2.5-0.5B-reproducibility.sh uses 8 GPUs, test scripts use 4 GPUs). This makes it difficult for new contributors to:

  • Understand the slime training flow on a single development GPU
  • Debug issues without reserving a multi-GPU node
  • Quickly iterate when developing new features (e.g., adding model support)

Proposal

Add a minimal 1-GPU run script scripts/run-qwen3-0.6B-minimal.sh that:

  • Uses Qwen3-0.6B (smallest supported Qwen3 model)
  • Runs on a single GPU with --tensor-model-parallel-size 1, --pipeline-model-parallel-size 1
  • Uses small batch sizes (global batch size 16, rollout batch size 4) and short response lengths (512 tokens)
  • Covers the full GRPO training loop (rollout + training + eval) on GSM8K

Existing alternatives

Script GPUs Limitation
scripts/run-qwen2.5-0.5B-reproducibility.sh 8 Designed for reproducibility, not minimal debugging
examples/true_on_policy/run_simple.py 1 (default) Only covers true on-policy mode, not the standard training flow
tests/test_qwen2.5_0.5B_debug_rollout_then_train.py 8 Test-oriented, not a standalone learning script

None of these provide a simple 1-GPU script for the standard training flow.

Additional Context

I'm happy to submit a PR for this if the team thinks it would be useful. The script is a single file with no changes to existing code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions