Skip to content

RL stack refactoring#3075

Open
s1lent4gnt wants to merge 2 commits intomainfrom
user/khalil-meftah/2026-02-16-rl-stack-refactor
Open

RL stack refactoring#3075
s1lent4gnt wants to merge 2 commits intomainfrom
user/khalil-meftah/2026-02-16-rl-stack-refactor

Conversation

@s1lent4gnt
Copy link
Member

@s1lent4gnt s1lent4gnt commented Mar 3, 2026

refactor(rl): Introduce a modular RL stack to support future RL algorithms and VLA fine-tuning with RL

Type / Scope

  • Type: Refactor
  • Scope: src/lerobot/rl/, src/lerobot/configs/, src/lerobot/processor/

Summary / Motivation

HIL-SERL is working (#504, PR #644), but training logic was coupled to the learner script. This refactor introduces RLAlgorithm, RLTrainer, and DataMixer so that adding a new algorithm = one file + config, same pattern as policies. No breaking changes to CLI or config. Phase 1 of broader plan.

Related issues

  • Roadmap / context: #3076 — RL Stack Refactoring call for contributions

What changed

  • src/lerobot/rl/algorithms/base.py (NEW): RLAlgorithm ABC, RLAlgorithmConfig draccus registry, TrainingStats dataclass.
  • src/lerobot/rl/algorithms/sac.py (NEW): SAC logic moved from learner; batch iterator pattern.
  • src/lerobot/rl/algorithms/__init__.py (NEW): Registry, make_algorithm() factory.
  • src/lerobot/rl/trainer.py (NEW): RLTrainer — preprocesses batches, delegates to algorithm.update().
  • src/lerobot/rl/data_sources/data_mixer.py (NEW): DataMixer ABC, OnlineOfflineMixer.
  • src/lerobot/rl/data_sources/__init__.py (NEW): Exports.
  • src/lerobot/rl/learner.py (MODIFIED): Delegates to algorithm.update(); −305 lines.
  • src/lerobot/rl/actor.py (MODIFIED): Uses RLAlgorithm.select_action(), load_weights().
  • src/lerobot/configs/train.py (MODIFIED): RLAlgorithmConfig in TrainRLServerPipelineConfig.
  • src/lerobot/processor/normalize_processor.py (MODIFIED): Pre/post processors for observation normalization.
  • tests/rl/test_sac_algorithm.py (NEW): SAC algorithm tests.
  • tests/rl/test_trainer.py (NEW): RLTrainer tests.
  • tests/rl/test_data_mixer.py (NEW): DataMixer tests.
  • tests/rl/test_actor_learner.py (MODIFIED): Integration tests for new abstractions.

How was this tested

  • Unit tests: pytest -q tests/rl/
  • Full suite: pytest
  • Tested with: gym-hil pick cube (PandaPickCubeGamepad-v0)

How to run locally (reviewer)

# Run RL tests
pytest -q tests/rl/

# RL training (two terminals, requires gym-hil)
python -m lerobot.rl.learner --config_path json/train_gym_hil.json
python -m lerobot.rl.actor --config_path json/train_gym_hil.json

Checklist (required before merge)

  • Linting/formatting run (pre-commit run -a)
  • All tests pass locally (pytest)
  • Documentation updated
  • CI is green

Reviewer notes

  • The core abstraction to review is RLAlgorithm in algorithms/base.py — this is the contract that every future RL algorithm will implement. Feedback on the interface design is especially welcome.
  • The update(batch_iterator) pattern is intentional: algorithms own the gradient-step loop (including UTD ratio), while the trainer owns data mixing and preprocessing.
  • SAC is the only algorithm for now. The abstractions are validated by the fact that SAC works through them.

…nd SAC restructuring

- Add RLAlgorithm base class and RLAlgorithmConfig with draccus.ChoiceRegistry
- Add RLTrainer for unified training orchestration with iterator pattern
- Add DataMixer and OnlineOfflineMixer for online/offline data mixing
- Restructure SAC algorithm with batch iterator and factory pattern
- Add observation normalization pre/post processors
- Add comprehensive tests for all new components
@github-actions github-actions bot added tests Problems with test coverage, failures, or improvements to testing configuration Problems with configuration files or settings processor Issue related to processor labels Mar 3, 2026
@s1lent4gnt s1lent4gnt self-assigned this Mar 3, 2026
@s1lent4gnt s1lent4gnt added the rl label Mar 3, 2026
@github-actions github-actions bot added policies Items related to robot policies examples Issues related to the examples labels Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

configuration Problems with configuration files or settings examples Issues related to the examples policies Items related to robot policies processor Issue related to processor rl tests Problems with test coverage, failures, or improvements to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant