Skip to content

Code for FLOWER VLA for finetuning FLOWER on CALVIN and all LIBERO environments

License

Notifications You must be signed in to change notification settings

intuitive-robots/flower_vla_calvin

Repository files navigation

FlowerVLA

Paper, Project Page,

Moritz Reuss1, Hongyi Zhou1, Marcel Ruehle1, Ömer Erdinç Yağmurlu1, Fabian Otto2, Rudolf Lioutikov1

1Intuitive Robots Lab (IRL), Karlsruhe Institute of Technology (KIT) 2Microsoft Research

FLOWER: An efficient & versatile Flow-VLA

FLOWER VLA is a lightweight, efficient Vision-Language-Action (VLA) policy for robotic manipulation tasks that achieves state-of-the-art performance on multiple benchmarks. Built on a rectified flow architecture with several key architecture features:

  • Efficient Architecture: At ~1B parameters, FLOWER is significantly smaller than most VLA models
  • Low Training Cost: Only requires ~200 GPU hours of pretraining
  • Low Memory Footprint: Uses <3GB of GPU memory for inference with single image setting
  • SOTA Performance: Achieves sota results on CALVIN and LIBERO benchmarks

For the pretraining of FLOWER and finetuning for Aloha check out our other codebase.

Model Overview

FLOWER VLA uses a Florence-2-large-based VLM combined with a rectified flow architecture to predict robot actions from visual observations and language instructions. The model efficiently handles multi-step action sequences through a chunking mechanism.

[Insert model architecture diagram here]

Key architectural components:

  • Florence-2 DaVit Image Encoder [350M params]
  • Language conditioning through cross-attention
  • Half of the Florence-2 LLM layers for vision and language fusion [205M]
  • Rectified Flow Predition for fast action generation (all results in CALVIN and LIBERO are achieved using just 4 denoising steps)
  • Global AdaLN for parameter efficient conditioning
  • Action chunking for multi-step action generation

Installation

To begin, clone this repository locally

git clone --recurse-submodules [email protected]:intuitive-robots/flower_vla_calvin.git
export flower_calvin_ROOT=$(pwd)/flower_vla_calvin

Install requirements (Note we provided a changed verison of pyhash, given numerous problems we encountered when installing it manually on our slurm cluster) You can also try to install setup tools using pip.

cd $flower_calvin_ROOT
conda create -n flower_cal python=3.9
conda activate flower_cal
cd calvin_env/tacto
pip install -e .
cd ..
pip install -e .
cd ..
cd LIBERO
pip install -r requirements.txt
pip install -e .
pip install numpy~=1.23
cd ..
pip install setuptools==57.5.0
cd pyhash-0.9.3
python setup.py build
python setup.py install
cd ..

Next we can install the rest of the missing packages

pip install -r requirements.txt

Download

CALVIN Dataset

If you want to train on the CALVIN dataset, choose a split with:

cd $flower_calvin_ROOT/dataset
sh download_data.sh D | ABCD

Training

To train the FLOWER FLOWERl with the 4 GPUS, run:

python flower/training.py 

You can use the pretrained FLOWER checkpoint from hf-link to train your own model on any of the datasets.

Note that during training the full CALVIN eval or LIBERO rollouts will be called after rollout_lh_skip_epochs and then every callbacks.rollout_lh.rollout_freq*1k training steps. Check out the training config for adopting the parameters.

For replication of the orginial training results I recommend to use 4 GPUs with a batch_size of 8 and train them for 40k steps for ABC (ABCD) and evaluating after 19 epochs to get the best possible results. See training configs for details.

Preprocessing with CALVIN

Since FLOWER uses action chunking, it needs to load multiple (~10) episode_{}.npz files for each inference. In combination with batching, this results in a large disk bandwidth needed for each iteration (usually ~2000MB/iteration). This has the potential of significantly reducing your GPU utilization rate during training depending on your hardware. Therefore, you can use the script extract_by_key.py to extract the data into a single file, avoiding opening too many episode files when using the CALVIN dataset.

Usage example:
python preprocess/extract_by_key.py -i /YOUR/PATH/TO/CALVIN/ \
    --in_task all
python preprocess/extract_by_key.py -i /hkfs/work/workspace/scratch/ft4740-play3/data --in_task all
Params:

Run this command to see more detailed information:

python preprocess/extract_by_key.py -h

Important params:

  • --in_root: /YOUR/PATH/TO/CALVIN/, e.g /data3/geyuan/datasets/CALVIN/
  • --extract_key: A key of dict(episode_xxx.npz), default is 'rel_actions', the saved file name depends on this (i.e ep_{extract_key}.npy) Optional params:
  • --in_task: default is 'all', meaning all task folders (e.g task_ABCD_D/) of CALVIN
  • --in_split: default is 'all', meaning both training/ and validation/
  • --out_dir: optional, default is 'None', and will be converted to {in_root}/{in_task}/{in_split}/extracted/
  • --force: whether to overwrite existing extracted data Thanks to @ygtxr1997 for debugging the GPU utilization and providing a merge request.

Evaluation

Download the pretrained FLOWER from Hugging Face: You can find all checkpoints under:

We provide pretrained checkpoints for all CALVIN and LIBERO challenges.

Performance Comparison on CALVIN Challenges (1000 chains)

Below is the average performance of FLOWER on CALVIN. It currenty is SOTA for all variants of CALVIN:

Train→Test Method PrT 1 2 3 4 5 Avg. Len.
ABCD→D Diff-P-CNN × 86.3% 72.7% 60.1% 51.2% 41.7% 3.16±0.06
Diff-P-T × 78.3% 53.9% 33.8% 20.4% 11.3% 1.98±0.09
RoboFlamingo 96.4% 89.6% 82.4% 74.0% 66.0% 4.09±0.00
GR-1 94.9% 89.6% 84.4% 78.9% 73.1% 4.21±0.00
MDT × 98.6% 95.8% 91.6% 86.2% 80.1% 4.52±0.02
MoDE 97.1% 92.5% 87.9% 83.5% 77.9% 4.39±0.04
KomosVLA 98.0% 93.6% 85.4% 77.8% 70.4% 4.49
FLOWER (ours) × 99.1% 97.8% 95.2% 92.4% 87.8% 4.67±0.04
ABC→D Diff-P-CNN × 63.5% 35.3% 19.4% 10.7% 6.4% 1.35±0.05
Diff-P-T × 62.2% 30.9% 13.2% 5.0% 1.6% 1.13±0.02
RoboFlamingo 82.4% 61.9% 46.6% 33.1% 23.5% 2.47±0.00
GR-1 85.4% 71.2% 59.6% 49.7% 40.1% 3.06±0.00
3DDA × 93.8% 80.3% 66.2% 53.3% 41.2% 3.35
MoDE 96.2% 88.9% 81.1% 71.8% 63.5% 4.01±0.04
KomosVLA 98.0% 93.6% 85.4% 77.8% 70.4% 4.25
VPP 95.7% 91.2% 86.3% 81.0% 75.0% 4.29
Seer 96.3% 91.6% 86.1% 80.3% 74.0% 4.28
FLOWER (ours) × 99.3% 95.9% 90.5% 84.8% 77.5% 4.54±0.02
D→D FLOWER (ours) × 98.4% 94.0% 87.9% 81.7% 74.1% 4.36±0.04

Performance on LIBERO Benchmarks

FLOWER achieves strong performance across all LIBERO benchmarks:

Benchmark FLOWER Success Rate
LIBERO-10 94.5%
LIBERO-90 93.4%
LIBERO-SPATIAL 97.2%
LIBERO-OBJECT 99.3%
LIBERO-GOAL 96.9%

Common Issues

Sometimes this causes problems for the python env so just delete it:

log.info(f"Using calvin_env with commit {get_git_commit_hash(Path(calvin_env.__file__))}.")

The path for this line is in the CALVIN env repo: https://github.com/mees/calvin_env/blob/797142c588c21e76717268b7b430958dbd13bf48/calvin_env/envs/play_table_env.py#L72


Acknowledgements

This work is only possible because of the code from the following open-source projects and datasets. We thank all authors for their work:

CALVIN

Original: https://github.com/mees/calvin

License: MIT

LIBERO

Original: https://github.com/Lifelong-Robot-Learning/LIBERO

License: https://github.com/Lifelong-Robot-Learning/LIBERO?tab=MIT-1-ov-file

Mimictest

Original: mimictest License: license

HULC

Original: https://github.com/lukashermann/hulc

License: MIT

FLOWER Pretraining Codebase

Original: https://github.com/intuitive-robots/FLOWER_Diffusion_Policy

License: https://github.com/intuitive-robots/FLOWER_Diffusion_Policy/blob/main/LICENSE

Citation

If you found the code usefull, please cite our work: (arxiv coming very soon)

@inproceedings{
    reuss2025flower,
    ???
}

About

Code for FLOWER VLA for finetuning FLOWER on CALVIN and all LIBERO environments

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published