VL2Det

Description

This repository contains the VL2Lite codebase plus a detection-focused adaptation used in this project. It keeps the frozen VLM distillation recipe, but swaps the classifier for torchvision detectors and adds a YOLO-format detection pipeline with optional detector-teacher KD.

Features

Frozen VLM Teacher: No teacher fine-tuning required
Detection Pipeline: YOLO-format dataset to torchvision detectors with background label offset
VLM KD for Detection: Pooled detector features aligned to VLM image/text embeddings
Optional Detector-Teacher KD: Box/score matching distillation term
Configurable: Hydra-based setup for custom data, model, or experiment scripts

Detection Adaptation (What Changed)

Data: src/data/yolo_detection_dataset.py converts YOLO labels to torchvision targets and can return CLIP-normalized images for the VLM teacher; data.label_offset=1 reserves background label 0.
Model: src/models/components/detection.py wraps torchvision detectors, pools backbone features for KD, and aligns them with VLM image/text embeddings.
Training: src/models/detection_kd_module.py mixes detection loss with VLM KD and an optional detector-teacher KD term.
Configs: configs/data/detection_yolo.yaml, configs/model/kd_detection.yaml, and configs/model/detection_teacher.yaml.

Installation

Go to the project folder:
```
cd vl2det
```

(Optional) Create conda environment:

conda env create -f environment.yaml
conda activate vl2det

or

conda create -n vl2det python=3.10
conda activate vl2det

Install PyTorch per official instructions.
Install requirements:
```
pip install -r requirements.txt
```
(Optional) Editable install (enables train_command / eval_command entry points):
```
pip install -e .
```

Data Setup

By default, datasets live under data/kd_datasets/ (see configs/paths/default.yaml). For a custom YOLO dataset, either update configs/data/attributes/detection_example.yaml or override on the command line:

python src/train.py data=detection_yolo \
  data.attributes.data_yaml=/absolute/path/to/data.yaml

You can also override folder names:

python src/train.py data=detection_yolo \
  data.attributes.images_train=images/train data.attributes.labels_train=labels/train

OPIXray conversion

If you have the raw OPIXray release (train/test folders with *_annotation and *_image), convert it to YOLO format:

python scripts/prepare_opixray_yolo.py \
  --data-root data/kd_datasets/OPIXray_raw \
  --output-root data/kd_datasets/OPIXray

You can also pass a zip archive with --zip path/to/opixray.zip.

You can download OPIXray from the link https://github.com/OPIXray-author/OPIXray.

Detection (YOLO)

For YOLO-style detection datasets, point the config to your images and labels or a YOLO data YAML file.

Expected structure (default):

dataset_root/
  images/
    train/
    val/
    test/
  labels/
    train/
    val/
    test/

Label format (per line): class_id x_center y_center width height (normalized). By default, labels are offset by +1 for background (data.label_offset=1).

Override data.attributes.* or set data.attributes.data_yaml to use a YOLO YAML file.

Train: VLM KD only (no detector teacher)

python src/train.py data=detection_yolo model=kd_detection callbacks=detection trainer=gpu logger=tensorboard \
  model.use_det_teacher=false

Train: detector-teacher baseline (no KD)

python src/train.py data=detection_yolo model=detection_teacher callbacks=detection trainer=gpu logger=tensorboard

You can also run bash scripts/train_detection_teacher.sh for a simple teacher baseline.

Train: VLM KD + detector teacher

python src/train.py data=detection_yolo model=kd_detection callbacks=detection trainer=gpu logger=tensorboard \
  model.use_det_teacher=true \
  model.det_teacher.ckpt_path=logs/train/runs/<run>/checkpoints/epoch_XXX.ckpt

Evaluate a checkpoint

python src/eval.py data=detection_yolo model=kd_detection ckpt_path=/absolute/path/to/ckpt.ckpt

Optional: EfficientDet students

EfficientDet requires effdet and timm (included in requirements.txt).

python src/train.py data=detection_yolo model=kd_detection callbacks=detection trainer=gpu \
  model.net.student.arch=efficientdet_d0 model.net.student.image_size=512

python src/train.py data=detection_yolo model=kd_detection callbacks=detection trainer=gpu \
  model.net.student.arch=efficientdet_d1 model.net.student.image_size=640

Plot TensorBoard scalars

python scripts/plot_tensorboard_scalars.py logs/train/runs/<run1> logs/train/runs/<run2> \
  --legends run1 run2 --out-dir plots/tensorboard

Inference on images (save PNG + JSON)

python scripts/infer_detection.py /path/to/images \
  --ckpt logs/train/runs/<run>/checkpoints/last.ckpt \
  --arch fasterrcnn_mobilenet_v3_large_fpn \
  --num-classes 5 \
  --data-yaml /path/to/data.yaml \
  --out-dir outputs/infer \
  --score-threshold 0.3 \
  --recursive

Default output directory: `outputs/infer_<timestamp>`. Use `--out-dir` to override.

Hydra Notes

We use PyTorch Lightning for training loops and Hydra for configuration. You can override any config from the CLI, for example:

python src/train.py data=detection_yolo model=kd_detection trainer=gpu \
  data.batch_size=16 trainer.max_epochs=50

Pick a config from configs/experiment/:

python src/train.py experiment=experiment_name

Configuration

trainer configs in configs/trainer/
data configs in configs/data/
model configs in configs/model/
experiment configs in configs/experiment/

Hydra allows combining or overriding these configs easily.

Acknowledgments

Built upon the Lightning-Hydra-Template.
We thank open-source projects (PyTorch, Lightning, Hydra) that enable this work.

Citation

If you use this code or find VL2Lite helpful, please cite:

@misc{jang2025vl2lite,
  title={VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks},
  author={Jang, Jinseong and Ma, Chunfei and Lee, Byeongwon},
  journal={CVPR},
  year={2025}
}

License

This project is licensed under the MIT License. Please see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
configs		configs
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VL2Det

Description

Features

Detection Adaptation (What Changed)

Installation

Data Setup

OPIXray conversion

Detection (YOLO)

Train: VLM KD only (no detector teacher)

Train: detector-teacher baseline (no KD)

Train: VLM KD + detector teacher

Evaluate a checkpoint

Optional: EfficientDet students

Plot TensorBoard scalars

Inference on images (save PNG + JSON)

Default output directory: `outputs/infer_<timestamp>`. Use `--out-dir` to override.

Hydra Notes

Configuration

Acknowledgments

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VL2Det

Description

Features

Detection Adaptation (What Changed)

Installation

Data Setup

OPIXray conversion

Detection (YOLO)

Train: VLM KD only (no detector teacher)

Train: detector-teacher baseline (no KD)

Train: VLM KD + detector teacher

Evaluate a checkpoint

Optional: EfficientDet students

Plot TensorBoard scalars

Inference on images (save PNG + JSON)

Default output directory: outputs/infer_<timestamp>. Use --out-dir to override.

Hydra Notes

Configuration

Acknowledgments

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Default output directory: `outputs/infer_<timestamp>`. Use `--out-dir` to override.

Packages