RL stack refactoring by s1lent4gnt · Pull Request #3075 · huggingface/lerobot

s1lent4gnt · 2026-03-03T16:14:48Z

refactor(rl): Introduce a modular RL stack to support future RL algorithms and VLA fine-tuning with RL

Type / Scope

Type: Refactor
Scope: src/lerobot/rl/, src/lerobot/configs/, src/lerobot/processor/

Summary / Motivation

HIL-SERL is working (#504, PR #644), but training logic was coupled to the learner script. This refactor introduces RLAlgorithm, RLTrainer, and DataMixer so that adding a new algorithm = one file + config, same pattern as policies. No breaking changes to CLI or config. Phase 1 of broader plan.

Related issues

Roadmap / context: #3076 — RL Stack Refactoring call for contributions

What changed

src/lerobot/rl/algorithms/base.py (NEW): RLAlgorithm ABC, RLAlgorithmConfig draccus registry, TrainingStats dataclass.
src/lerobot/rl/algorithms/sac.py (NEW): SAC logic moved from learner; batch iterator pattern.
src/lerobot/rl/algorithms/__init__.py (NEW): Registry, make_algorithm() factory.
src/lerobot/rl/trainer.py (NEW): RLTrainer — preprocesses batches, delegates to algorithm.update().
src/lerobot/rl/data_sources/data_mixer.py (NEW): DataMixer ABC, OnlineOfflineMixer.
src/lerobot/rl/data_sources/__init__.py (NEW): Exports.
src/lerobot/rl/learner.py (MODIFIED): Delegates to algorithm.update(); −305 lines.
src/lerobot/rl/actor.py (MODIFIED): Uses RLAlgorithm.select_action(), load_weights().
src/lerobot/configs/train.py (MODIFIED): RLAlgorithmConfig in TrainRLServerPipelineConfig.
src/lerobot/processor/normalize_processor.py (MODIFIED): Pre/post processors for observation normalization.
tests/rl/test_sac_algorithm.py (NEW): SAC algorithm tests.
tests/rl/test_trainer.py (NEW): RLTrainer tests.
tests/rl/test_data_mixer.py (NEW): DataMixer tests.
tests/rl/test_actor_learner.py (MODIFIED): Integration tests for new abstractions.

How was this tested

Unit tests: pytest -q tests/rl/
Full suite: pytest
Tested with: gym-hil pick cube (PandaPickCubeGamepad-v0)

How to run locally (reviewer)

# Run RL tests
pytest -q tests/rl/

# RL training (two terminals, requires gym-hil)
python -m lerobot.rl.learner --config_path json/train_gym_hil.json
python -m lerobot.rl.actor --config_path json/train_gym_hil.json

Checklist (required before merge)

Linting/formatting run (pre-commit run -a)
All tests pass locally (pytest)
Documentation updated
CI is green

Reviewer notes

The core abstraction to review is RLAlgorithm in algorithms/base.py — this is the contract that every future RL algorithm will implement. Feedback on the interface design is especially welcome.
The update(batch_iterator) pattern is intentional: algorithms own the gradient-step loop (including UTD ratio), while the trainer owns data mixing and preprocessing.
SAC is the only algorithm for now. The abstractions are validated by the fact that SAC works through them.

…nd SAC restructuring - Add RLAlgorithm base class and RLAlgorithmConfig with draccus.ChoiceRegistry - Add RLTrainer for unified training orchestration with iterator pattern - Add DataMixer and OnlineOfflineMixer for online/offline data mixing - Restructure SAC algorithm with batch iterator and factory pattern - Add observation normalization pre/post processors - Add comprehensive tests for all new components

github-actions bot added tests Problems with test coverage, failures, or improvements to testing configuration Problems with configuration files or settings processor Issue related to processor labels Mar 3, 2026

s1lent4gnt mentioned this pull request Mar 3, 2026

RL Stack Refactoring: Call for Contributions #3076

Open

13 tasks

s1lent4gnt self-assigned this Mar 3, 2026

s1lent4gnt added the rl label Mar 3, 2026

refactor: decouple policy from algorithm

1f5487e

github-actions bot added policies Items related to robot policies examples Issues related to the examples labels Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RL stack refactoring#3075

RL stack refactoring#3075
s1lent4gnt wants to merge 2 commits intomainfrom
user/khalil-meftah/2026-02-16-rl-stack-refactor

s1lent4gnt commented Mar 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

s1lent4gnt commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

refactor(rl): Introduce a modular RL stack to support future RL algorithms and VLA fine-tuning with RL

Type / Scope

Summary / Motivation

Related issues

What changed

How was this tested

How to run locally (reviewer)

Checklist (required before merge)

Reviewer notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

s1lent4gnt commented Mar 3, 2026 •

edited

Loading