[WIP] [HIL-SERL] Add Flow Q-learning (FQL) agent with action chunking + VLA #2596

jpizarrom · 2025-12-07T17:31:59Z

What this does

This PR is an experiment to add VLA as BC actor into ACFQL on top of #1818

How it was tested

uv run python -m lerobot.rl.acfqlvla.learner \
    --config_path ../lerobot-configs-grocery-so100-fqlvla/train_gym_hil_env_fql_lilkm_pushfreq25x20251007.json \
    --wandb.enable=true \
    --policy.bc_policy=SmolVLA \
    --policy.vla_pretrained_name_or_path=lerobot/smolvla_base \
    --batch_size=64 \
    --policy.cache_observation_features_vla=true \
    --policy.cfg.enabled=true \
    --policy.recap_style_advantages=true \
    --save_freq=500

uv run python -m lerobot.rl.acfql.learner \
    --config_path ../lerobot-configs-grocery-so100-fqlvla/train_config_hilserl_so100_gamepad_ee_20fps.json \
    --job_name=7d-pre1m-25120222 \
    --wandb.project=hilserl-acfql-vla-recap-so100-grocery-butter-real-so100-20fps \
    --policy.offline_steps=1000000 \
    --policy.online_steps=0 \
    --policy.online_step_before_learning=5000 \
    --wandb.enable=true \
    --policy.actor_learner_config.policy_parameters_push_frequency=25 \
    --env.fps=20 \
    --save_replay_buffer_on_checkpoint=false \
    --save_offline_replay_buffer_on_checkpoint=false \
    --dataset.repo_id=jpizarrom/hilserl_so100_grocery_so100_2025112223_20 \
    --policy.offline_buffer_capacity=209290 \
    --policy.online_buffer_capacity=0 \
    --online_dataset=null \
    --policy.normalize_q_loss=null \
    --policy.storage_device_offline_replay_buffer=cuda \
    --policy.storage_device_replay_buffer=cpu \
    --policy.critic_grad_clip_norm=200.0 \
    --policy.actor_bc_grad_clip_norm=3.0 \
    --policy.actor_onestep_grad_clip_norm=800.0 \
    --policy.alpha=300.0 \
    --policy.shared_encoder=false \
    --policy.load_vlm_weights=false \
    --policy.chunk_size=10 \
    --policy.n_action_steps=10 \
    --policy.max_action_dim=32 \
    --policy.max_state_dim=32 \
    --policy.num_vlm_layers=16 \
    --policy.expert_width_multiplier=0.75 \
    --batch_size=64 \
    --policy.bc_policy=SmolVLA \
    --policy.cfg.enabled=true \
    --policy.recap_style_advantages=true \
    --save_freq=10000

TODO

use VLA + RECAP-style advantage signal as Behavior Cloning actor (continuous value net,resnet+mlp)
offline RL learner
train on task on sim only offline RL
train on task on real so100 only offline RL
online RL learner/actor
train on task on real so100 only offline RL + online RL
use VLA + RECAP-style advantage signal as Behavior Cloning actor (distributional value net)
share configs
tests coverage

also can be good idea

VLA as one-step actor
multi task transformer based critic
multi task transformer based value network

…km/finetune-vla-qc-fql

initialize offline buffer when offline_steps > 0

Co-authored-by: s1lent4gnt <[email protected]>

…g_4dim

…o add_acfql_flowmatching_4dim-eepos

…s' into add_acfql_flowmatching_4dim-eepos-so101

…s-so101' into add_acfql_flowmatching_4dim-eepos-so101-vla

…s-so101-vla' into add_acfql_flowmatching_4dim-eepos-so101-vla-recap

jpizarrom and others added 30 commits October 12, 2025 17:16

refactored policy from https://github.com/s1lent4gnt/lerobot/tree/lil…

461ee7f

…km/finetune-vla-qc-fql

use kb and gamepad

36f452c

eval policy works

7a41fe5

use make_pre_post_processors in rl

dbd60a0

use make_pre_post_processors on actor

fd5c617

revert opencv deps and remove teleop reset

1b14cc0

add scale_velocity param to GripperVelocityToJoint

f3469d3

initialize offline buffer when offline_steps > 0

add fps_tracker to actor

61340a3

add online_dataset

8948d96

add episode_time

cdc140b

fix permute of images

27c5169

Add Co-author

30ea864

Co-authored-by: s1lent4gnt <[email protected]>

add todo comments

6c20f29

normalize actions and use normalized next_observations in learner

962cc77

allow no replay buffer on resume offline

b8aefea

resume offline_step counter

d89f682

move acfql scrips into own folder

21b69fc

make rotation on processors configurable

afc0c12

add config to acfql

9d50123

use ReplayBufferNSteps

1c29e53

update Create processors

233ddfe

remove unmodified functions from actor and gym

ccd4509

remove unmodified functions from learner

bac3346

use generator on from_lerobot_dataset to reduce memory

1e54a5d

add param for steps in reset_follower_position

6f4b4a0

add processor tests

bf52526

Merge remote-tracking branch 'origin/main' into add_acfql_flowmatchin…

51e2c2e

…g_4dim

The preprocessor may add extra keys, filter them out

33886ce

rotation step on end_effector_step_sizes default to 1

b6eef95

remove comments from model and learner

2c68c08

jpizarrom added 19 commits November 22, 2025 00:31

use current obs for done or truncated in calql

9a17710

critic_forward in calql should use current obs

0438cbe

add only_successful_online_step_before_learning

0bcce59

add tests for return to go

ec4b78d

bound q_preds on lse

927cb33

normalize_by_k before logsumexp

ec629c2

add calql mask_truncated

b20bd78

add Classifier-Free Guidance in Teacher policy

b432387

update both splits unconditional and conditional

c6bbc4f

use feat names from config

46f1713

Merge branch 'main' into add_acfql_flowmatching_4dim

9b8f86f

remove register_third_party_devices from acfql

dcebd95

Merge remote-tracking branch 'origin/add_acfql_flowmatching_4dim' int…

7afa315

…o add_acfql_flowmatching_4dim-eepos

Merge remote-tracking branch 'origin/add_acfql_flowmatching_4dim-eepo…

04f94c4

…s' into add_acfql_flowmatching_4dim-eepos-so101

Merge remote-tracking branch 'origin/add_acfql_flowmatching_4dim-eepo…

b0ad3cc

…s-so101' into add_acfql_flowmatching_4dim-eepos-so101-vla

add missing only_successful_online_step_before_learning

daeb7e9

Merge remote-tracking branch 'origin/add_acfql_flowmatching_4dim-eepo…

c16cdb2

…s-so101-vla' into add_acfql_flowmatching_4dim-eepos-so101-vla-recap

move custom SmolVLA forward to SACObservationEncoderVLA

a10e221

remove acfql

8b0a21b

jpizarrom changed the title ~~[HIL-SERL] Add Flow Q-learning (FQL) agent with action chunking + VLA~~ [HIL-SERL] Add Flow Q-learning (FQL) agent with action chunking + VLA as BC Dec 7, 2025

jpizarrom changed the title ~~[HIL-SERL] Add Flow Q-learning (FQL) agent with action chunking + VLA as BC~~ [HIL-SERL] Add Flow Q-learning (FQL) agent with action chunking + VLA Dec 7, 2025

jpizarrom changed the title ~~[HIL-SERL] Add Flow Q-learning (FQL) agent with action chunking + VLA~~ [WIP] [HIL-SERL] Add Flow Q-learning (FQL) agent with action chunking + VLA Dec 7, 2025

jpizarrom added 8 commits December 11, 2025 08:18

use precise_sleep

047c031

use register_third_party_plugins

3488a54

use 2x task lang when dataset_repo_id

8d0ef49

fix missing complementary_info and cacl return to go on actor

3b31d92

fix graph compite, don't call .item()

65a3a3e

add edit ds to acfql

7ede1e9

validate config

5c88985

add actor_onestep_actor_type

484cc09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] [HIL-SERL] Add Flow Q-learning (FQL) agent with action chunking + VLA #2596

[WIP] [HIL-SERL] Add Flow Q-learning (FQL) agent with action chunking + VLA #2596

Uh oh!

jpizarrom commented Dec 7, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] [HIL-SERL] Add Flow Q-learning (FQL) agent with action chunking + VLA #2596

Are you sure you want to change the base?

[WIP] [HIL-SERL] Add Flow Q-learning (FQL) agent with action chunking + VLA #2596

Uh oh!

Conversation

jpizarrom commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

How it was tested

TODO

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jpizarrom commented Dec 7, 2025 •

edited

Loading