[📄 Paper] | [🤗 Model (EchoTraffic)] | [🤗 Dataset (AV-TAU)]
We release the AV-TAU dataset to support audio-visual traffic anomaly understanding.
👉 Available on AV-TAU
Pre-trained and Supervised Fine‑tuning (SFT) checkpoints for EchoTraffic are publicly available.
👉 Available on EchoTraffic
# Clone the repository
git clone https://github.com/HarryHsing/EchoTraffic
cd EchoTraffic
# Create conda environment
conda env create -f environment.yml
conda activate echotrafficpython inference.py \
--video-path ./test_video/036680.mp4 \
--prompt "What unusual event takes place in the video?" \
--model-path ./ckpt/videollama_video_audio_sft/checkpoint_0.pth \
--cfg-path ./eval_configs/finetune_eval.yaml \
--gpu-id 0⚙️ Trained on A6000 (48G) GPUs.
torchrun --nproc_per_node=8 train.py --cfg-path ./train_configs/video_audio_pretrain.yamltorchrun --nproc_per_node=4 train.py --cfg-path ./train_configs/video_audio_finetune.yamlYou can download EchoTraffic from Hugging Face.
./ckpt/
├── llama-2-7b-chat-hf
├── videollama_video_audio_pretrain # Pre‑training checkpoint
├── videollama_video_audio_sft # Supervised Fine‑tuning (SFT) checkpoint
└── imagebind_huge.pth
You can download AV-TAU from Hugging Face.
./datasets/AV-TAU/
├── videos/ # raw video files
├── annotations/
│ ├── sft_formatted_train.json # annotation file
│ └── sft_formatted_test.json # annotation fileYou can download InternVid-10M-FLT from OpenDataLab.
./datasets/vd-foundation___InternVid-10M-FLT/
├── InternVid-10M-FLT-INFO_with_audio.jsonl # annotation file (exact name may vary)
└── raw/ # raw video filesIf you use this dataset or our paper, please cite:
@InProceedings{Xing_2025_CVPR,
author = {Xing, Zhenghao and Chen, Hao and Xie, Binzhu and Xu, Jiaqi and Guo, Ziyu and Xu, Xuemiao and Hao, Jianye and Fu, Chi-Wing and Hu, Xiaowei and Heng, Pheng-Ann},
title = {EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2025},
pages = {19098-19108}
}