r2l - a Rust Reinforcement Learning Library

[!WARNING] > Pre-Alpha: This library is under active development. APIs may change, documentation is sparse, features may be added or removed, and bugs should be expected.

r2l is a minimalist reinforcement learning library written in Rust, designed to be customizable, ergonomic, and easily embeddable.

Why r2l

Despite the simple nature of most reinforcement learning algorithms, existing implementations often introduce hidden complexity. Libraries like Stable Baselines3 include many build-in features that are not part of the original design and makes reasoning about the code harder. r2l takes a different approach by exposing key parts of the training loop through a hook system. This allows for more flexibility in logging, analysis and experimentation. Furthermore the hook system allows easy integration into the wider Rust ecosystem, allowing for the easy creation of TUI/GUI applications.

An example of embedding r2l in a terminal application

What's included

r2l is organized as a set of crates:

r2l-core – Defines the core abstractions for reinforcement learning (environments, policy, ...). Also contains a minimal set of on-policy algorithms like PPO.
r2l-gym – A wrapper around Python's Gym environments.
r2l-macros – Provides procedural macros to simplify implementing and registering hooks.
r2l-tui – A TUI application that demonstrates how to embed r2l into ratatui.

How to use it

Set up a PPO agent:

let env = GymEnv::new("Pendulum-v1", None, &Device::Cpu)?;
let (input_dim, out_dim) = env.io_sizes();
let builder = PPOBuilder {
    input_dim,
    out_dim,
    ..Default::default()
};
let mut agent = builder.build()?;

Adding a batching hook

Suppose you want to observe and print the policy loss during training. You can define a batching hook like this:

fn batch_hook(policy_loss: &mut PolicyLoss) -> candle_core::Result<bool> {
    println!("policy loss: {}", policy_loss.to_scalar::<f32>()?);
    Ok(false)
}

Register it with the agent:

agent.hooks.set_batching_hook(batch_hook);

Hook function flexibility

Batch hook functions can accept any subset of the following arguments:

policy: &mut PolicyKind,
rollout_batch: &RolloutBatch,
policy_loss: &mut PolicyLoss,
value_loss: &mut ValueLoss,
data: &PPOBatchData,

For example, if you'd like to log both the policy loss and a derived quantity from the rollout data:

fn batch_hook(policy_loss: &mut PolicyLoss, data: &PPOBatchData) -> candle_core::Result<bool> {
    println!("policy loss: {}", policy_loss.to_scalar::<f32>()?);
    println!("logp diff: {:?}", data.logp_diff.deref());
    Ok(false)
}

Training

After this, you can run the training loop, and your terminal should display both the policy loss and the logp diffs:

let env_pool = DummyVecEnv {
    n_env: 1,
    env,
    rollout_mode: RolloutMode::StepBound { n_steps: 1024 },
};
let learning_schedule = LearningSchedule::new(100);
let mut algo = OnPolicyAlgorithm::new(env_pool, agent, learning_schedule);
algo.train()?;

For a more advanced and interactive example, see the r2l-tui crate.

Roadmap

Current version: v0.0.1
The project is in an early experimental phase. Expect missing features, frequent breaking changes, bugs, and everything in between.

`v0.1.0` – Core Algorithm Coverage (SB3 parity)

Implement all algorithms available in Stable Baselines3
Add benchmarks for simple environments (e.g. CartPole, Pendulum)
Introduce a high-level builder API for setting up agents with established hooks for logging, observability, and training control
Expect significant API changes

`v1.0.0` – Extended Algorithm Set (Tianshou parity)

Implement remaining algorithms from Tianshou
Finalize the hook and training APIs
Provide stable interfaces for embedding, visualization, and training control
Improve documentation, examples, and possibly add multi-agent support

Future directions may include:

r2l-gui using egui
Snapshotting via SafeTensors or ONNX
TensorBoard integration for monitoring

Contributing

Any and all contributions are welcome. If you have a feature request, let me know by opening an issue about it, but please understand that while the project is ambitious, there are no corporate backers and I work on it in my spare time.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
assets		assets
book		book
crates		crates
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

r2l - a Rust Reinforcement Learning Library

Why r2l

What's included

How to use it

Adding a batching hook

Hook function flexibility

Training

Roadmap

`v0.1.0` – Core Algorithm Coverage (SB3 parity)

`v1.0.0` – Extended Algorithm Set (Tianshou parity)

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

r2l - a Rust Reinforcement Learning Library

Why r2l

What's included

How to use it

Adding a batching hook

Hook function flexibility

Training

Roadmap

v0.1.0 – Core Algorithm Coverage (SB3 parity)

v1.0.0 – Extended Algorithm Set (Tianshou parity)

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`v0.1.0` – Core Algorithm Coverage (SB3 parity)

`v1.0.0` – Extended Algorithm Set (Tianshou parity)

Packages