Skip to content

alexanderswerdlow/faster

Repository files navigation

FASTER: Value-Guided Sampling for Fast RL

Perry Dong*  ·  Alexander Swerdlow*  ·  Dorsa Sadigh  ·  Chelsea Finn

Stanford University

*Equal contribution

FASTER overview

Overview

Many of the strongest RL algorithms today rely on best-of-N action sampling with a value critic — they pay to fully denoise N candidates and keep only one. FASTER recovers the gains of best-of-N without the same sampling cost.

FASTER frames best-of-N denoising as a Markov Decision Process over the diffusion trajectory and learns a denoise critic that scores candidates before denoising completes. At inference time we sample N noise seeds, rank them with the critic, and fully denoise only the top-ranked seed — collapsing inference cost to a single rollout regardless of N.

Setup

uv sync
source .env && python scripts/download_robomimic_datasets.py

The code expects the Robomimic low-dim datasets low_dim_v141.hdf5 in $ROBOMIMIC_DATASETS_PATH, which defaults to ./datasets/robomimic.

Short sanity run of FASTER-EXPO online to check setup
source .env && WANDB_MODE=offline python train_robo.py \
  --dataset_dir=ph \
  --config.model_cls=FasterEXPOLearner \
  --log_dir=exp \
  --env_name=can \
  --eval_interval=1 \
  --eval_episodes=2 \
  --start_training=1 \
  --max_steps=2

Training Commands

Please see the following scripts for the different training settings.

Setting Task Script
FASTER-EXPO online can scripts/faster_expo_online_can.sh
FASTER-EXPO online lift scripts/faster_expo_online_lift.sh
FASTER-EXPO online square scripts/faster_expo_online_square.sh
FASTER-EXPO online tool_hang scripts/faster_expo_online_tool_hang.sh
FASTER-IDQL online can scripts/faster_idql_online_can.sh
FASTER-IDQL online lift scripts/faster_idql_online_lift.sh
FASTER-IDQL online square scripts/faster_idql_online_square.sh
FASTER-EXPO batch-online can scripts/faster_expo_batch_online_can.sh
FASTER-EXPO batch-online lift scripts/faster_expo_batch_online_lift.sh
FASTER-EXPO batch-online square scripts/faster_expo_batch_online_square.sh

Each script accepts extra CLI overrides via $@. For example:

bash scripts/faster_expo_online_can.sh --log_dir=exp --seed=0

Outputs

Each run creates a dir under --log_dir which defaults to ./exp, e.g.:

exp/2026_04_17__12_34_56__s0/
├── flags.json
├── train.csv
├── eval.csv
├── checkpoints/   # if --checkpoint_model=True
└── buffers/       # if --checkpoint_buffer=True

Acknowledgements

The training infrastructure builds on RLPD, IDQL, and Robomimic. We thank the authors for their open-source releases.

Citation

@article{dong2026faster,
  title   = {FASTER: Value-Guided Sampling for Fast RL},
  author  = {Dong, Perry and Swerdlow, Alexander and Sadigh, Dorsa and Finn, Chelsea},
  journal = {arXiv preprint arXiv:2604.19730},
  year    = {2026},
  url     = {https://arxiv.org/abs/2604.19730}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors