This repository contains the implementation of Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control (SQIL), accepted to ICCV 2025. It provides fine-tuning pipelines for adapting OpenVLA models on the LIBERO benchmark datasets using the SQIL framework.
SQIL is the first systematic study of Quantized Imitation Learning, revealing that most quantized failures occur at mission-critical states requiring fine-grained control. By leveraging policy-driven saliency (SIS) and a SIS-weighted 4-bit QAT scheme, SQIL achieves 2–4× efficiency gains while preserving full-precision-level success rates across real-world robotics, autonomous driving and physics simulation.
# Create and activate environment
conda create -n sqil python=3.10 -y
conda activate sqil
# Install PyTorch (adjust CUDA version if needed)
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia -y
# Clone repository
git clone https://github.com/aiha-lab/sqil.git
cd sqil
pip install -e .
# (Optional) for Flash-Attention 2
pip install packaging ninja
pip install "flash-attn==2.5.5" --no-build-isolationClone and install the LIBERO repo:
git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
cd LIBERO
pip install -e .Additionally, install other required packages (the same as OpenVLA):
cd sqil
pip install -r experiments/robot/libero/libero_requirements.txtTo download the modified versions of the LIBERO datasets that we used in our fine-tuning experiments, run the command below. This will download the LIBERO-Spatial, LIBERO-Object, LIBERO-Goal, and LIBERO-10 (Long) datasets in RLDS data format (~10 GB total) released by the OpenVLA project.
git clone git@hf.co:datasets/openvla/modified_libero_rldsBefore running SQIL, you need a low-precision (quantized) policy.
Please use an existing repo such as:
Compute per-sample State Importance Scores using a frozen teacher VLA.
torchrun --standalone --nnodes 1 --nproc-per-node 1 vla-scripts/finetune_sqil.py \
--precompute True \
--data_root_dir <PATH/TO/RLDS> \
--dataset_name <DATASET_NAME> \
--saliency_cache_path <PATH/TO/SAVE/sis_cache.jsonl> \
--batch_size 16 \
--image_aug FalseFine-tune a quantized VLA using cached SIS scores to modulate imitation loss.
torchrun --standalone --nnodes 1 --nproc-per-node 1 vla-scripts/finetune_sqil.py \
--precompute False \
--vla_path <PATH/TO/QUANTIZED_VLA> \
--data_root_dir <PATH/TO/RLDS> \
--dataset_name <DATASET_NAME> \
--saliency_cache_path <PATH/TO/SAVE/sis_cache.jsonl> \
--run_root_dir <PATH/TO/RUNS> \
--adapter_tmp_dir <PATH/TO/ADAPTER_TMP> \
--batch_size 16 \
--learning_rate 2e-5 \
--image_aug True \
--wandb_project sqil \
--wandb_entity <YOUR_WANDB_ID>After fine-tuning, you can evaluate the resulting SQIL policy on the LIBERO benchmarks using the same evaluation pipeline as OpenVLA.
We provide evaluation scripts under experiments/robot/libero/.
# Example: evaluate SQIL-tuned policy on LIBERO
# Launch LIBERO-Spatial evals
python experiments/robot/libero/run_libero_eval.py \
--model_family openvla \
--pretrained_checkpoint <PATH/TO/YOUR_SQIL_CHECKPOINT_SPATIAL> \
--task_suite_name libero_spatial \
--center_crop True
# Launch LIBERO-Object evals
python experiments/robot/libero/run_libero_eval.py \
--model_family openvla \
--pretrained_checkpoint <PATH/TO/YOUR_SQIL_CHECKPOINT_OBJECT> \
--task_suite_name libero_object \
--center_crop True
# Launch LIBERO-Goal evals
python experiments/robot/libero/run_libero_eval.py \
--model_family openvla \
--pretrained_checkpoint <PATH/TO/YOUR_SQIL_CHECKPOINT_GOAL>\
--task_suite_name libero_goal \
--center_crop True
# Launch LIBERO-10 (LIBERO-Long) evals
python experiments/robot/libero/run_libero_eval.py \
--model_family openvla \
--pretrained_checkpoint <PATH/TO/YOUR_SQIL_CHECKPOINT_LONG>\
--task_suite_name libero_10 \
--center_crop Truesqil/
├── vla-scripts/
│ └── finetune_sqil.py # main fine-tuning entry
├── prismatic/ # OpenVLA / Prismatic modules
└── datasets/ # RLDS-formatted datasets
If you find this repository useful, please cite:
@inproceedings{park2025sqil,
title = {Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control},
author = {Park, Seongmin and Kim, Hyungmin and Kim, Sangwoo and Jeon, Wonseok and Yang, Juyoung and Jeon, Byeongwook and Oh, Yoonseon and Choi, Jungwook},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year = {2025}
}SQIL builds upon OpenVLA and Prismatic VLMs. We thank the Open X-Embodiment and LIBERO teams for datasets and simulation benchmarks.