DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer
This repository contains the official implementation of the paper DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer.
cd util
python setup.py install --user # build NMS
cd ..cd models/digit/ops
python setup.py build install
cd ../../..We follow ActionFormer repository and Video Mamba Suite for preparing datasets including THUMOS14, ActivityNet v1.3, and HACS-Segment.
Use scripts/make_feature_info.py to generate feature information for each dataset.
THUMOS14 is already prepared in the repository.
To train the DiGIT model on the THUMOS14 dataset, execute the following command:
python main.py --c config/digit/internvideo2/thumos14.py --output_dir logs/thumos14To evaluate the trained model and obtain performance metrics, use the following command structure:
python main.py --eval --c config/digit/internvideo2/thumos14.py --output_dir logs/thumos14if you find our work helpful, please consider citing our paper:
@InProceedings{Kim_2025_CVPR,
author = {Kim, Ho-Joong and Lee, Yearang and Hong, Jung-Ho and Lee, Seong-Whan},
title = {DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {24286-24296}
}
