FASTER: Value-Guided Sampling for Fast RL

Perry Dong^* · Alexander Swerdlow^* · Dorsa Sadigh · Chelsea Finn

Stanford University

_{^*Equal contribution}

Overview

Many of the strongest RL algorithms today rely on best-of-N action sampling with a value critic — they pay to fully denoise N candidates and keep only one. FASTER recovers the gains of best-of-N without the same sampling cost.

FASTER frames best-of-N denoising as a Markov Decision Process over the diffusion trajectory and learns a denoise critic that scores candidates before denoising completes. At inference time we sample N noise seeds, rank them with the critic, and fully denoise only the top-ranked seed — collapsing inference cost to a single rollout regardless of N.

Setup

uv sync
source .env && python scripts/download_robomimic_datasets.py

The code expects the Robomimic low-dim datasets low_dim_v141.hdf5 in $ROBOMIMIC_DATASETS_PATH, which defaults to ./datasets/robomimic.

Short sanity run of FASTER-EXPO online to check setup

source .env && WANDB_MODE=offline python train_robo.py \
  --dataset_dir=ph \
  --config.model_cls=FasterEXPOLearner \
  --log_dir=exp \
  --env_name=can \
  --eval_interval=1 \
  --eval_episodes=2 \
  --start_training=1 \
  --max_steps=2

Training Commands

Please see the following scripts for the different training settings.

Setting	Task	Script
FASTER-EXPO online	can	`scripts/faster_expo_online_can.sh`
FASTER-EXPO online	lift	`scripts/faster_expo_online_lift.sh`
FASTER-EXPO online	square	`scripts/faster_expo_online_square.sh`
FASTER-EXPO online	tool_hang	`scripts/faster_expo_online_tool_hang.sh`
FASTER-IDQL online	can	`scripts/faster_idql_online_can.sh`
FASTER-IDQL online	lift	`scripts/faster_idql_online_lift.sh`
FASTER-IDQL online	square	`scripts/faster_idql_online_square.sh`
FASTER-EXPO batch-online	can	`scripts/faster_expo_batch_online_can.sh`
FASTER-EXPO batch-online	lift	`scripts/faster_expo_batch_online_lift.sh`
FASTER-EXPO batch-online	square	`scripts/faster_expo_batch_online_square.sh`

Each script accepts extra CLI overrides via $@. For example:

bash scripts/faster_expo_online_can.sh --log_dir=exp --seed=0

Outputs

Each run creates a dir under --log_dir which defaults to ./exp, e.g.:

exp/2026_04_17__12_34_56__s0/
├── flags.json
├── train.csv
├── eval.csv
├── checkpoints/   # if --checkpoint_model=True
└── buffers/       # if --checkpoint_buffer=True

Acknowledgements

The training infrastructure builds on RLPD, IDQL, and Robomimic. We thank the authors for their open-source releases.

Citation

@article{dong2026faster,
  title   = {FASTER: Value-Guided Sampling for Fast RL},
  author  = {Dong, Perry and Swerdlow, Alexander and Sadigh, Dorsa and Finn, Chelsea},
  journal = {arXiv preprint arXiv:2604.19730},
  year    = {2026},
  url     = {https://arxiv.org/abs/2604.19730}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
configs		configs
faster		faster
scripts		scripts
.env		.env
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
README.md		README.md
pyproject.toml		pyproject.toml
sitecustomize.py		sitecustomize.py
train_batch.py		train_batch.py
train_robo.py		train_robo.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FASTER: Value-Guided Sampling for Fast RL

Overview

Setup

Training Commands

Outputs

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FASTER: Value-Guided Sampling for Fast RL

Overview

Setup

Training Commands

Outputs

Acknowledgements

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages