Welcome to the AReaL Quickstart Guide! This guide demonstrates how to run an AReaL experiment training an LLM on the GSM8K dataset using the GRPO algorithm with function-based rewards. Ensure you've completed the installation and environment setup before proceeding.
To run the experiment, you will need:
- Training script: examples/math/gsm8k_rl.py
- Config YAML: examples/math/gsm8k_grpo.yaml
Our training scripts will automatically download the dataset (openai/gsm8k) and model (Qwen/Qwen2-1.5B-Instruct). To run the example with default configuration, execute from the repository directory:
python3 examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo.yaml scheduler.type=local experiment_name=<your experiment name> trial_name=<your trial name>
Note: For distributed experiments across multiple nodes, see Distributed Experiments with Ray or Slurm.
All available configuration options are listed in areal/api/cli_args.py. To customize the experiment (models, resources, algorithm options), you can:
- Edit the YAML file directly at examples/math/gsm8k_grpo.yaml.
- Add command-line options:
- For existing options in the YAML file, directly add the option:
actor.path=Qwen/Qwen3-1.7B. - For other options in
cli_args.py, but not in the YAML file, add with a prefix "+":+sglang.attention_backend=triton.
- For existing options in the YAML file, directly add the option:
For example, here is the command to launch a customized configuration, based on our GSM8K GRPO example:
python3 examples/math/gsm8k_rl.py \
--config examples/math/gsm8k_grpo.yaml \
scheduler.type=local \
experiment_name=<your experiment name> \
trial_name=<your trial name> \
allocation_mode=sglang:d2p1t1+d2p1t1 \
cluster.n_nodes=1 \
cluster.n_gpus_per_node=4 \
gconfig.max_new_tokens=2048 \
train_dataset.batch_size=1024 \
+sglang.attention_backend=triton
To enable Hugging Face Kernels in training, add the train engine overrides explicitly:
python3 examples/math/gsm8k_rl.py \
--config examples/math/gsm8k_grpo.yaml \
scheduler.type=local \
experiment_name=<your experiment name> \
trial_name=<your trial name> \
+actor.attn_impl=kernels-community/flash-attn \
+actor.use_kernels=trueApply the same overrides to critic or teacher if those engines should also use
kernels.
(distributed-experiments-with-ray-or-slurm)=
For distributed experiments across multiple nodes, you can use Ray or Slurm schedulers. After setting up your Ray or Slurm cluster, launch experiments by specifying the appropriate scheduler type:
# Launch with Ray scheduler. 4 nodes (4 GPUs each), 3 nodes for generation, 1 node for training.
python3 examples/math/gsm8k_rl.py \
--config examples/math/gsm8k_grpo.yaml \
scheduler.type=ray \
experiment_name=<your experiment name> \
trial_name=<your trial name> \
allocation_mode=sglang:d12p1t1+d4p1t1 \
cluster.n_nodes=4 \
cluster.n_gpus_per_node=4
# Launch with Slurm scheduler. 16 nodes (8 GPUs each), 12 nodes for generation, 4 nodes for training
python3 examples/math/gsm8k_rl.py \
--config examples/math/gsm8k_grpo.yaml \
scheduler.type=slurm \
experiment_name=<your experiment name> \
trial_name=<your trial name> \
allocation_mode=sglang:d96p1t1+d32p1t1 \
cluster.n_nodes=16 \
cluster.n_gpus_per_node=8
Additional references:
- For more options for schedulers, check
SchedulerConfigin areal/api/cli_args.py. - Ray cluster setup guide (see installation.md for distributed setup) for a guide on how to set up a ray cluster.
Important Note: Ensure
allocation_modematches your cluster configuration (#GPUs == cluster.n_nodes * cluster.n_gpus_per_node)
AReaL also supports SPMD (Single Program Multiple Data) mode via dedicated launchers.
This mode is maintained for backwards compatibility but the single-controller mode
(direct script execution with scheduler.type) is now the recommended approach for most
use cases.
In SPMD mode, the launcher manages process spawning via torchrun and sets
AREAL_SPMD_MODE=1. Each GPU worker runs the full training script independently, with
coordination handled through PyTorch distributed primitives.
# SPMD mode with local launcher (legacy)
python3 -m areal.infra.launcher.local examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo.yaml
# SPMD mode with Ray launcher (legacy)
python3 -m areal.infra.launcher.ray examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo.yaml
# SPMD mode with Slurm launcher (legacy)
python3 -m areal.infra.launcher.slurm examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo.yamlIf you want to directly run an experiment on cloud or your own Kubernetes
infrastructure, we recommend you to use SkyPilot. After installing and setting up
SkyPilot (see {ref}Install SkyPilot <install-skypilot>), you could launch a
distributed experiment based on our SkyPilot example (two 8xA100 GPU nodes) with one
command line:
# Launch on GCP
sky launch -c areal-test examples/skypilot/ray_cluster.sky.yaml --infra gcp
# Launch on AWS
sky launch -c areal-test examples/skypilot/ray_cluster.sky.yaml --infra aws
# Launch on your K8s Cluster
sky launch -c areal-test examples/skypilot/ray_cluster.sky.yaml --infra k8sCheck Running AReaL with SkyPilot, for more details about the examples. Check SkyPilot Documentation for more information about SkyPilot.
Check Getting Started with AReaL for a complete code walkthrough on the GRPO GSM8K Example.
Customization guides: