GitHub - PrimeIntellect-ai/prime-rl: Async RL Training at Scale

PRIME-RL: Async RL Training at Scale

Overview

PRIME-RL is a framework for large-scale asynchronous reinforcement learning. It is designed to be easy-to-use and hackable, yet capable of scaling to 1000+ GPUs. Beyond that, here is why we think you might like it:

Integrates natively with verifiers environments via the Environments Hub
Supports end-to-end post-training, including SFT and RL training and evals
Multi-node deployment with FSDP2 training and vLLM inference backend
Designed for asynchronous training in decentralized settings
Hackable, modular and extensible by nature

Setup

We develop and test on NVIDIA RTX 3090/4090/5090, A100, H100, H200, and B200. If your setup fails, please create an issue.

Prerequisites

Currently, you need at least one NVIDIA GPU to use PRIME-RL. If you don't already have access to one, we recommend our compute platform for everything from renting on-demand single GPUs for developing, debugging and small ablations, to reserving 1000+ GPU clusters for production-scale training.

Quick Setup

Set up PRIME-RL in a single command.

curl -sSL https://raw.githubusercontent.com/PrimeIntellect-ai/prime-rl/main/scripts/install.sh | bash

Manual Setup

Clone the repository

git clone https://github.com/PrimeIntellect-ai/prime-rl.git
cd prime-rl

Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

Install dependencies from the lock file

uv sync

Validate your environment setup

Check that the environment uses Python 3.12

uv run python -V

Check that flash-attn is installed

uv run python -c "import flash_attn"

Check that you can run SFT trainer (this requires 1 GPU)

uv run sft @ configs/debug/sft/train.toml

Check that you can run the RL trainer (this requires 1 GPU)

uv run trainer @ configs/debug/rl/train.toml

Check that you can run the inference server (this requires 1 GPU)

uv run inference @ configs/debug/infer.toml

Keep the inference server running in the background for the next steps.

5.1. Check that you can run the orchestrator against the inference server

uv run orchestrator @ configs/debug/orch.toml

5.2. Check that you can run evals against the inference server

uv run eval @ configs/debug/eval.toml

Additional Setup

If you want to log your runs to W&B, log in

uv run wandb login
# Or set `export WANDB_API_KEY=...`

If you require gated/ private models or datasets from HuggingFace, log in

uv run hf auth login
# Or set `export HF_TOKEN=...`

Training Examples

We provide end-to-end training examples in the examples directory to highlight features of the framework and guide you through the process of training your own models.

Reverse Text: Train Qwen3-0.6B to reverse a small chunk of text. Demonstrates tiny-scale single-turn SFT and RL training. Can be trained on a single consumer GPU in a few minutes, and is ideal for getting started.
Wordle: Train Qwen3-1.7B to play Wordle. A fun example of multi-turn SFT and RL training. Can be trained on a 2-4 H100 GPUs in a few hours. Ideal for exploring the multi-turn training capabilities of the framework.
Alphabet Sort: Train Qwen3-4B-Instruct-2507 to sort names alphabetically. Demonstrates multi-turn RL training via LoRA without SFT warmup. Can be trained on a single H100 GPU in just over an hour. Ideal for exploring LoRA-based training.
More to come...

Docs

Check out the docs directory for in-depth guides on how to use PRIME-RL.

Entrypoints - Overview of the main components (orchestrator, trainer, inference) and how to run SFT, RL, and evals
Configs - Configuration system using TOML files, CLI arguments, and environment variables
Environments - Installing and using verifiers environments from the Environments Hub
Async Training - Understanding asynchronous off-policy training and step semantics
Logging - Logging with loguru, torchrun, and Weights & Biases
Checkpointing - Saving and resuming training from checkpoints
Benchmarking - Performance benchmarking and throughput measurement
Deployment - Training deployment on single-GPU, multi-GPU, and multi-node clusters
Troubleshooting - Common issues and their solutions

Contributing

We warmly welcome community contributions! We use issues to track bugs, feature requests, and share our internal roadmap. If you encounter bugs, have pain points during development, or have ideas for new features, please open an issue.

Contributions are welcome via PR. Please follow these guidelines:

Install the pre-commit hooks to ensure your code is formatted correctly.
Please keep your PR in "Draft" until it is ready for review.
If your PR resolves an issue, please link the issue in the PR description
If you can, try running the test suite locally to ensure your changes are working as expected.

Pre-Commit Hooks

Please install the pre-commit hooks to ensure your code is formatted correctly.

uv run pre-commit install

Tests

Run the full test suite

uv run pytest -v

To run unit tests, run

uv run pytest tests/unit -v

To run integration tests, run

uv run pytest tests/integration -v

To run CPU-only tests, use the inverse of the gpu marker:

uv run pytest -v -m "not gpu"

License

This project is licensed under the Apache 2.0 license, as found in the License file.

Citation

If you find our work useful, feel free to cite it using

@misc{primeintellect2025prime-rl,
  author = {Prime Intellect},
  title = {PRIME-RL},
  url = {https://github.com/PrimeIntellect-ai/prime-rl},
  year = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,247 Commits
.github		.github
configs		configs
docs		docs
examples		examples
scripts		scripts
src/prime_rl		src/prime_rl
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
Dockerfile.cuda		Dockerfile.cuda
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PRIME-RL: Async RL Training at Scale

Overview

Setup

Prerequisites

Quick Setup

Additional Setup

Training Examples

Docs

Contributing

Pre-Commit Hooks

Tests

License

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 35

Languages

License

PrimeIntellect-ai/prime-rl

Folders and files

Latest commit

History

Repository files navigation

PRIME-RL: Async RL Training at Scale

Overview

Setup

Prerequisites

Quick Setup

Additional Setup

Training Examples

Docs

Contributing

Pre-Commit Hooks

Tests

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 35

Languages

Packages