This repository contains the VL2Lite codebase plus a detection-focused adaptation used in this project. It keeps the frozen VLM distillation recipe, but swaps the classifier for torchvision detectors and adds a YOLO-format detection pipeline with optional detector-teacher KD.
- Frozen VLM Teacher: No teacher fine-tuning required
- Detection Pipeline: YOLO-format dataset to torchvision detectors with background label offset
- VLM KD for Detection: Pooled detector features aligned to VLM image/text embeddings
- Optional Detector-Teacher KD: Box/score matching distillation term
- Configurable: Hydra-based setup for custom data, model, or experiment scripts
- Data:
src/data/yolo_detection_dataset.pyconverts YOLO labels to torchvision targets and can return CLIP-normalized images for the VLM teacher;data.label_offset=1reserves background label 0. - Model:
src/models/components/detection.pywraps torchvision detectors, pools backbone features for KD, and aligns them with VLM image/text embeddings. - Training:
src/models/detection_kd_module.pymixes detection loss with VLM KD and an optional detector-teacher KD term. - Configs:
configs/data/detection_yolo.yaml,configs/model/kd_detection.yaml, andconfigs/model/detection_teacher.yaml.
-
Go to the project folder:
cd vl2det -
(Optional) Create conda environment:
conda env create -f environment.yaml conda activate vl2det
or
conda create -n vl2det python=3.10 conda activate vl2det
-
Install PyTorch per official instructions.
-
Install requirements:
pip install -r requirements.txt
-
(Optional) Editable install (enables
train_command/eval_commandentry points):pip install -e .
By default, datasets live under data/kd_datasets/ (see configs/paths/default.yaml).
For a custom YOLO dataset, either update configs/data/attributes/detection_example.yaml or override on the command line:
python src/train.py data=detection_yolo \
data.attributes.data_yaml=/absolute/path/to/data.yamlYou can also override folder names:
python src/train.py data=detection_yolo \
data.attributes.images_train=images/train data.attributes.labels_train=labels/trainIf you have the raw OPIXray release (train/test folders with *_annotation and *_image), convert it to YOLO format:
python scripts/prepare_opixray_yolo.py \
--data-root data/kd_datasets/OPIXray_raw \
--output-root data/kd_datasets/OPIXrayYou can also pass a zip archive with --zip path/to/opixray.zip.
You can download OPIXray from the link https://github.com/OPIXray-author/OPIXray.
For YOLO-style detection datasets, point the config to your images and labels or a YOLO data YAML file.
Expected structure (default):
dataset_root/
images/
train/
val/
test/
labels/
train/
val/
test/
Label format (per line): class_id x_center y_center width height (normalized).
By default, labels are offset by +1 for background (data.label_offset=1).
Override data.attributes.* or set data.attributes.data_yaml to use a YOLO YAML file.
python src/train.py data=detection_yolo model=kd_detection callbacks=detection trainer=gpu logger=tensorboard \
model.use_det_teacher=falsepython src/train.py data=detection_yolo model=detection_teacher callbacks=detection trainer=gpu logger=tensorboardYou can also run bash scripts/train_detection_teacher.sh for a simple teacher baseline.
python src/train.py data=detection_yolo model=kd_detection callbacks=detection trainer=gpu logger=tensorboard \
model.use_det_teacher=true \
model.det_teacher.ckpt_path=logs/train/runs/<run>/checkpoints/epoch_XXX.ckptpython src/eval.py data=detection_yolo model=kd_detection ckpt_path=/absolute/path/to/ckpt.ckptEfficientDet requires effdet and timm (included in requirements.txt).
python src/train.py data=detection_yolo model=kd_detection callbacks=detection trainer=gpu \
model.net.student.arch=efficientdet_d0 model.net.student.image_size=512python src/train.py data=detection_yolo model=kd_detection callbacks=detection trainer=gpu \
model.net.student.arch=efficientdet_d1 model.net.student.image_size=640python scripts/plot_tensorboard_scalars.py logs/train/runs/<run1> logs/train/runs/<run2> \
--legends run1 run2 --out-dir plots/tensorboardpython scripts/infer_detection.py /path/to/images \
--ckpt logs/train/runs/<run>/checkpoints/last.ckpt \
--arch fasterrcnn_mobilenet_v3_large_fpn \
--num-classes 5 \
--data-yaml /path/to/data.yaml \
--out-dir outputs/infer \
--score-threshold 0.3 \
--recursiveWe use PyTorch Lightning for training loops and Hydra for configuration. You can override any config from the CLI, for example:
python src/train.py data=detection_yolo model=kd_detection trainer=gpu \
data.batch_size=16 trainer.max_epochs=50Pick a config from configs/experiment/:
python src/train.py experiment=experiment_name- trainer configs in
configs/trainer/ - data configs in
configs/data/ - model configs in
configs/model/ - experiment configs in
configs/experiment/
Hydra allows combining or overriding these configs easily.
Built upon the Lightning-Hydra-Template.
We thank open-source projects (PyTorch, Lightning, Hydra) that enable this work.
If you use this code or find VL2Lite helpful, please cite:
@misc{jang2025vl2lite,
title={VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks},
author={Jang, Jinseong and Ma, Chunfei and Lee, Byeongwon},
journal={CVPR},
year={2025}
}This project is licensed under the MIT License. Please see the LICENSE file for details.