AReaL is a distributed RL training framework for LLM alignment via reinforcement learning.
Tech Stack: Python 3.12+ | PyTorch | FSDP2/Megatron | SGLang/vLLM
Core Directories:
areal/- Core packageapi/- Config dataclasses, workflow/engine contractsengine/- FSDP2, Megatron, SGLang/vLLM adaptersfsdp_utils/- FSDP2-specific utilities (checkpoint, grad, optimizer, parallel)megatron_utils/- Megatron/FP8 utilities (checkpoint, pipeline, quantization)core/- Engine-shared utilities (distributed, lock, model, offload)
infra/- Infrastructure (launcher, scheduler, RPC)utils/- Infrastructure utilities (launcher, proc, http, concurrent, slurm, ray)
workflow/- RolloutWorkflow implementationsreward/- Reward functionsdataset/- Dataset loadersutils/- Cross-cutting utilities (logging, data, checkpoints, network, RL functional)
examples/- Training scripts and configsdocs/- Jupyter Book source
- Enable efficient RL training for LLM alignment at scale
- Async rollout + distributed training for high throughput
- Modular design: workflows, engines, rewards, and datasets are independently extensible
# Check environment
python --version # Requires 3.12+
uv --version # Install: https://docs.astral.sh/uv/
# Sync dependencies
uv sync --extra cuda # CUDA + SGLang inference (default)
uv sync --group dev # Include dev/test packages
uv run python3 areal/tools/validate_installation.py # Validate installation
# Pre-commit hooks
pre-commit install --install-hooks # Set up hooks (run once)
pre-commit run --all-files # Format and lint
# Run tests
# First check GPU availability (many tests require GPU)
python -c "import torch; print('GPU available:', torch.cuda.is_available())"
uv run pytest tests/test_<topic>.py
# Generate CLI docs
uv run python docs/generate_cli_docs.py
# Build docs (canonical, release-aligned)
./docs/build_all.sh
# Do NOT use `jupyter-book build docs/en|docs/zh` directly for final preview/release,
# because it skips AReaL-specific static setup and output packaging.- Designed for distributed GPU clusters; assume containerized execution
- Integration tests require multi-node hardware; explain skips when unavailable
- Secrets and endpoints are managed outside the repo
- Read relevant files before modifying code
- Run
pre-commit run --all-filesbefore committing - Follow existing code patterns in the same module
- Add tests for new functionality
- Modifying config structures in
areal/api/cli_args.py - Adding new dependencies
- Changing launcher or scheduler logic
- Deleting or renaming public APIs
- Running GPU/distributed tests (check GPU first:
python -c "import torch; print('GPU available:', torch.cuda.is_available())")
- Hardcode secrets, paths, or endpoints
- Skip pre-commit hooks
- Guess cluster configs or rebuild CUDA/driver stacks
- Use wildcard imports (
from x import *)
| Task | Reference |
|---|---|
| Add Workflow | docs/customization/agent.md, areal/workflow/multi_turn.py |
| Add Dataset | docs/customization/, areal/dataset/gsm8k.py |
| Add Reward | areal/api/reward_api.py, areal/reward/geometry3k.py |
| Add Archon Model | areal/experimental/models/archon/qwen2/, qwen3/ |
| Algorithm Details | docs/algorithms/*.md |
| Quickstart | docs/tutorial/quickstart.md |
| Architecture Deep Dive | docs/tutorial/gsm8k_grpo.md |
| CLI Reference | docs/cli_reference.md |
- Commits: Conventional Commits (
feat:,fix:,docs:), ~72 chars subject, imperative voice, reasoning in body - Squash: Squash WIP commits before opening PR
- PR requirements: Run pre-commit, document test coverage, note hardware limitations
See .claude/agents/, .claude/skills/, .claude/commands/, and .claude/rules/ for
specialized instructions.
| Agent | Purpose | Activation Trigger |
|---|---|---|
planner |
Implementation planning | Before multi-file changes, new features, or architectural decisions |
simple-code-reviewer |
Quick code quality checks | After code changes, before committing |
code-verifier |
Formatting/linting/tests | After code changes, before committing |
fsdp-engine-expert |
FSDPEngine implementation | FSDPEngine code changes or questions |
archon-engine-expert |
ArchonEngine implementation | ArchonEngine code changes or questions |
megatron-engine-expert |
MegatronEngine implementation | MegatronEngine code changes or questions |
algorithm-expert |
RL algorithms | GRPO/PPO/DAPO questions |
launcher-scheduler-expert |
Cluster launching and resource scheduling | Launcher/scheduler code changes or configuration questions |
Stage-by-Stage Agent Guidance:
- Planning Stage (Before coding): Use
plannerfor architecture design and implementation planning - Code Formatting & Linting (After coding): Use
code-verifierto automatically run formatting, linting, and tests, catching syntax errors and style issues quickly - Code Quality Check (After formatting): Use
simple-code-reviewerfor quick code quality checks, focusing on logic issues and code smells
Skills provide step-by-step guides for common development tasks:
/add-dataset- Dataset loader creation guide/add-workflow- Workflow implementation guide/add-reward- Reward function guide/add-archon-model- Archon engine model architecture guide/debug-distributed- Distributed debugging guide/add-unit-tests- Test development guide (NEW)
Commands perform specific actions when invoked:
/create-pr- Rebase, squash commits, and create/update PR with intelligent messages/gen-commit-msg- Generate commit messages from staged changes/review-pr- Intelligent PR code review with dynamic agent allocation/translate-doc-zh- Translate English documentation to Chinese
Project-wide standards enforced across all code changes:
api-config.md- Configuration dataclass design patternscode-style.md- Coding conventions beyond pre-commit hooksdistributed.md- Distributed training patterns and constraintstesting.md- Testing strategy and coverage requirements