Add sarm #2639

pkooij · 2025-12-12T19:04:49Z

Differences compared to original:

Handling insufficient history (sampling):
LeRobot:
Clamps indices to valid bounds and applies copy-padding at the beginning and end.
Samples frames in a bidirectional manner (both backward- and forward-looking).
Original:
Uses an adaptive stride when looking backward and there is not enough history.

Progress visualization

Of trained SARM model for sparse and dense annotations (sparse every frame inference, dense has stride = 30)

Inference example on failed episode, (sparse and dense) (https://huggingface.co/datasets/lerobot-data-collection/eval_pi0_fold_11-30_1) (not in training data)

TODO:

Test RA_BC on real robot
Add RA_BC training loss curves

into feat/add_rewind

* Add generate and validate script * fix precommit * Improve generate embeddings function by using dataset tools (#2206) --------- Co-authored-by: Michel Aractingi <[email protected]>

HuggingFaceDocBuilderDev · 2025-12-12T19:06:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

s1lent4gnt · 2025-12-15T14:15:55Z

docs/source/sarm.mdx

+</hfoption>
+<hfoption id="dual">
+
+Visualize annotations using the `--visualize-only` flag:


We can remove this example,or we say this command to show both annotation because it looks similar to the above

s1lent4gnt · 2025-12-15T14:22:33Z

docs/source/sarm.mdx

+
+| Argument               | Description                                              |
+| ---------------------- | -------------------------------------------------------- |
+| `--visualize-only`     | Only visualize predictions (no RABC computation)         |


Change RABC to RA-BC for consistency

s1lent4gnt · 2025-12-15T14:38:40Z

src/lerobot/configs/train.py

+
+    # RA-BC (Reward-Aligned Behavior Cloning) parameters
+    use_rabc: bool = False  # Enable reward-weighted training
+    rabc_progress_path: str | None = None  # Path to precomputed SARM progress parquet file


Is it possible to infer the SARM progress from the dataset?

s1lent4gnt · 2025-12-15T14:41:36Z

pyproject.toml

    "ninja>=1.11.1,<2.0.0",
    "flash-attn>=2.5.9,<3.0.0 ; sys_platform != 'darwin'"
 ]
+sarm = ["lerobot[transformers-dep]", "faker>=33.0.0,<35.0.0"]


I think we need to add matplotlib for subtask_annotation.py

s1lent4gnt · 2025-12-15T14:56:35Z

src/lerobot/data_processing/sarm_annotations/subtask_annotation.py

@@ -0,0 +1,1221 @@
+#!/usr/bin/env python


I got memory/RAM issue running this script with Qwen/Qwen3-VL-4B-Instruct. We need to investigate why?

s1lent4gnt · 2025-12-15T14:59:31Z

src/lerobot/data_processing/sarm_annotations/subtask_annotation.py

+from pydantic import BaseModel, Field
+from qwen_vl_utils import process_vision_info
+from rich.console import Console
+from transformers import AutoProcessor, Qwen3VLMoeForConditionalGeneration


The subtask annotation can only use Qwen models family, maybe it's better to make it work with other models that maybe better than Qwen in the future.

s1lent4gnt · 2025-12-15T15:15:13Z

src/lerobot/policies/act/modeling_act.py

    def __init__(
        self,
        config: ACTConfig,
+        **kwargs,


We need to add per-sample losses.

s1lent4gnt · 2025-12-15T15:15:31Z

src/lerobot/policies/diffusion/modeling_diffusion.py

    def __init__(
        self,
        config: DiffusionConfig,
+        **kwargs,


We need to add per-sample losses here too.

s1lent4gnt · 2025-12-15T15:16:01Z

src/lerobot/policies/groot/modeling_groot.py

    config_class = GrootConfig

-    def __init__(self, config: GrootConfig):
+    def __init__(self, config: GrootConfig, **kwargs):


We need to add per-sample losses here too.

s1lent4gnt · 2025-12-15T15:17:12Z

src/lerobot/policies/tdmpc/modeling_tdmpc.py

    def __init__(
        self,
        config: TDMPCConfig,
+        **kwargs,


We need to add per-sample losses here too

s1lent4gnt · 2025-12-15T15:17:44Z

src/lerobot/policies/xvla/modeling_xvla.py

    name = "xvla"

-    def __init__(self, config: XVLAConfig):
+    def __init__(self, config: XVLAConfig, **kwargs):


We need to add per-sample losses here too.

s1lent4gnt · 2025-12-15T15:18:16Z

src/lerobot/policies/vqbet/modeling_vqbet.py

    def __init__(
        self,
        config: VQBeTConfig | None = None,
+        **kwargs,


We need to add per-sample losses here too.

s1lent4gnt · 2025-12-15T15:36:40Z

src/lerobot/policies/sarm/__init__.py

+    "StageTransformer",
+    "SubtaskTransformer",
+    "gen_stage_emb",
+    "SARMEncodingProcessorStep",


Do we need to add those here?

s1lent4gnt · 2025-12-15T15:40:13Z

src/lerobot/policies/sarm/compute_rabc_weights.py

+    # Full RABC computation with visualizations
+    python src/lerobot/policies/sarm/compute_rabc_weights.py \\
+        --dataset-repo-id lerobot/aloha_sim_insertion_human \\
+        --reward-model-path pepijn223/sarm_single_uni4
+
+    # Faster computation with stride (compute every 5 frames, interpolate the rest)
+    python src/lerobot/policies/sarm/compute_rabc_weights.py \\
+        --dataset-repo-id lerobot/aloha_sim_insertion_human \\
+        --reward-model-path pepijn223/sarm_single_uni4 \\
+        --stride 5
+
+    # Visualize predictions only (no RABC computation)
+    python src/lerobot/policies/sarm/compute_rabc_weights.py \\
+        --dataset-repo-id lerobot/aloha_sim_insertion_human \\
+        --reward-model-path pepijn223/sarm_single_uni4 \\
+        --visualize-only \\
+        --num-visualizations 5
+


Nice to change RABC to RA-BC

s1lent4gnt · 2025-12-15T15:43:41Z

src/lerobot/policies/sarm/compute_rabc_weights.py

+    return img
+
+
+def visualize_episode(


Does this works the same as visualize_episode function in subtask_annotation.py

s1lent4gnt

First code review looks good to me.
I want to discuss these points:

Should we add a separate folder for Reward Models as later we will add ReWiND?
Do we really need to use Faker lib?
Do we need to let users to add manual dataset subtasks annotations as an option?
Original SARM implementation uses two different optimizers for StageTransformer and SubtaskTransformer, should we do the same?

pkooij and others added 30 commits October 15, 2025 12:52

add initial modeling

d9f0c8c

make rewind pretrained policy

1da9eee

add annotation

cf0f878

small fix

3c9149e

add sarm

6986836

Merge branch 'feat/add_rewind' of https://github.com/huggingface/lerobot

f688eb1

into feat/add_rewind

subtasks

1ffdc6f

fix spawn

0d84f47

fix rewind discrepancies

52b080f

Add script to generate embedding for dataset (#2138)

9bd69bb

* Add generate and validate script * fix precommit * Improve generate embeddings function by using dataset tools (#2206) --------- Co-authored-by: Michel Aractingi <[email protected]>

Merge branch 'main' into feat/add_rewind

3d28dc3

cleanup

c2c0dbf

change order train log

4367348

print batch size

5245332

update sarm processor

ca67231

add reward output

d286ea3

change expected features

8d2fb5d

add image validation

2af4061

change validation

9a5a0ad

get state input from dataset stats

7beb208

raise if no state key is found

6b6a82b

pass stats

3b31c2d

cleanup and refactor

c774818

add episode inddex to complementary data

0c99b76

add subtask init and detection

2dc2a3a

revert lerobot_train changes

006185f

pass dataset metadata to policy

456d9fe

change loadig subtasks

599c247

add small logging

c66aef8

fix progress conversion and adding initial frame

cc2e91f

pkooij added 3 commits December 11, 2025 10:41

drop last frame for sampler

9196ba9

use original ref

596b907

some fixes

e29b7b8

pkooij added 7 commits December 12, 2025 19:16

fix visualization

370eea2

remove smoothing and fix order subtasks

28bbdae

add stride rabc computation

22d6f03

add push to hub

59acd10

add explanation

42af14f

add kappa expllaination

fa58b8e

better rabc logging

89427e8

s1lent4gnt reviewed Dec 15, 2025

View reviewed changes

pkooij self-assigned this Dec 15, 2025

pkooij added the policies Items related to robot policies label Dec 15, 2025

s1lent4gnt reviewed Dec 15, 2025

View reviewed changes

s1lent4gnt mentioned this pull request Dec 15, 2025

Fix handling of multi-dimensional feature slices in _copy_data_with_f… #2481

Closed

s1lent4gnt reviewed Dec 15, 2025

View reviewed changes

src/lerobot/policies/act/modeling_act.py

def __init__(

self,

config: ACTConfig,

**kwargs,

Copy link

Contributor

s1lent4gnt Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add per-sample losses.

s1lent4gnt reviewed Dec 15, 2025

View reviewed changes

Add sarm #2639

Are you sure you want to change the base?

Add sarm #2639

Uh oh!

Conversation

pkooij commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Differences compared to original:

Progress visualization

Uh oh!

HuggingFaceDocBuilderDev commented Dec 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

s1lent4gnt left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pkooij commented Dec 12, 2025 •

edited

Loading