Quickstart

Welcome to the AReaL Quickstart Guide! This guide demonstrates how to run an AReaL experiment training an LLM on the GSM8K dataset using the GRPO algorithm with function-based rewards. Ensure you've completed the installation and environment setup before proceeding.

Running the Experiment (on a single node)

To run the experiment, you will need:

Training script: examples/math/gsm8k_rl.py
Config YAML: examples/math/gsm8k_grpo.yaml

Our training scripts will automatically download the dataset (openai/gsm8k) and model (Qwen/Qwen2-1.5B-Instruct). To run the example with default configuration, execute from the repository directory:

python3 examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo.yaml scheduler.type=local experiment_name=<your experiment name> trial_name=<your trial name>

Note: For distributed experiments across multiple nodes, see Distributed Experiments with Ray or Slurm.

Modifying configuration

All available configuration options are listed in areal/api/cli_args.py. To customize the experiment (models, resources, algorithm options), you can:

Edit the YAML file directly at examples/math/gsm8k_grpo.yaml.
Add command-line options:
- For existing options in the YAML file, directly add the option: actor.path=Qwen/Qwen3-1.7B.
- For other options in cli_args.py, but not in the YAML file, add with a prefix "+": +sglang.attention_backend=triton.

For example, here is the command to launch a customized configuration, based on our GSM8K GRPO example:

python3 examples/math/gsm8k_rl.py \
    --config examples/math/gsm8k_grpo.yaml \
    scheduler.type=local \
    experiment_name=<your experiment name> \
    trial_name=<your trial name> \
    allocation_mode=sglang:d2p1t1+d2p1t1 \
    cluster.n_nodes=1 \
    cluster.n_gpus_per_node=4 \
    gconfig.max_new_tokens=2048 \
    train_dataset.batch_size=1024 \
    +sglang.attention_backend=triton

To enable Hugging Face Kernels in training, add the train engine overrides explicitly:

python3 examples/math/gsm8k_rl.py \
    --config examples/math/gsm8k_grpo.yaml \
    scheduler.type=local \
    experiment_name=<your experiment name> \
    trial_name=<your trial name> \
    +actor.attn_impl=kernels-community/flash-attn \
    +actor.use_kernels=true

Apply the same overrides to critic or teacher if those engines should also use kernels.

(distributed-experiments-with-ray-or-slurm)=

Distributed Experiments with Ray or Slurm

For distributed experiments across multiple nodes, you can use Ray or Slurm schedulers. After setting up your Ray or Slurm cluster, launch experiments by specifying the appropriate scheduler type:

# Launch with Ray scheduler. 4 nodes (4 GPUs each), 3 nodes for generation, 1 node for training.
python3 examples/math/gsm8k_rl.py \
    --config examples/math/gsm8k_grpo.yaml \
    scheduler.type=ray \
    experiment_name=<your experiment name> \
    trial_name=<your trial name> \
    allocation_mode=sglang:d12p1t1+d4p1t1 \
    cluster.n_nodes=4 \
    cluster.n_gpus_per_node=4

# Launch with Slurm scheduler. 16 nodes (8 GPUs each), 12 nodes for generation, 4 nodes for training
python3 examples/math/gsm8k_rl.py \
    --config examples/math/gsm8k_grpo.yaml \
    scheduler.type=slurm \
    experiment_name=<your experiment name> \
    trial_name=<your trial name> \
    allocation_mode=sglang:d96p1t1+d32p1t1 \
    cluster.n_nodes=16 \
    cluster.n_gpus_per_node=8

Additional references:

For more options for schedulers, check SchedulerConfig in areal/api/cli_args.py.
Ray cluster setup guide (see installation.md for distributed setup) for a guide on how to set up a ray cluster.

Important Note: Ensure allocation_mode matches your cluster configuration (#GPUs == cluster.n_nodes * cluster.n_gpus_per_node)

Legacy: SPMD Mode with Dedicated Launchers

AReaL also supports SPMD (Single Program Multiple Data) mode via dedicated launchers. This mode is maintained for backwards compatibility but the single-controller mode (direct script execution with scheduler.type) is now the recommended approach for most use cases.

In SPMD mode, the launcher manages process spawning via torchrun and sets AREAL_SPMD_MODE=1. Each GPU worker runs the full training script independently, with coordination handled through PyTorch distributed primitives.

# SPMD mode with local launcher (legacy)
python3 -m areal.infra.launcher.local examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo.yaml

# SPMD mode with Ray launcher (legacy)
python3 -m areal.infra.launcher.ray examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo.yaml

# SPMD mode with Slurm launcher (legacy)
python3 -m areal.infra.launcher.slurm examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo.yaml

Distributed Experiments on Cloud or K8s with SkyPilot

If you want to directly run an experiment on cloud or your own Kubernetes infrastructure, we recommend you to use SkyPilot. After installing and setting up SkyPilot (see {ref}Install SkyPilot <install-skypilot>), you could launch a distributed experiment based on our SkyPilot example (two 8xA100 GPU nodes) with one command line:

# Launch on GCP
sky launch -c areal-test examples/skypilot/ray_cluster.sky.yaml --infra gcp
# Launch on AWS
sky launch -c areal-test examples/skypilot/ray_cluster.sky.yaml --infra aws
# Launch on your K8s Cluster
sky launch -c areal-test examples/skypilot/ray_cluster.sky.yaml --infra k8s

Check Running AReaL with SkyPilot, for more details about the examples. Check SkyPilot Documentation for more information about SkyPilot.

Next Steps

Check Getting Started with AReaL for a complete code walkthrough on the GRPO GSM8K Example.

Customization guides:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quickstart

Running the Experiment (on a single node)

Modifying configuration

Distributed Experiments with Ray or Slurm

Legacy: SPMD Mode with Dedicated Launchers

Distributed Experiments on Cloud or K8s with SkyPilot

Next Steps

FilesExpand file tree

quickstart.md

Latest commit

History

quickstart.md

File metadata and controls

Quickstart

Running the Experiment (on a single node)

Modifying configuration

Distributed Experiments with Ray or Slurm

Legacy: SPMD Mode with Dedicated Launchers

Distributed Experiments on Cloud or K8s with SkyPilot

Next Steps