TAM: Temporal Adaptive Module for Video Recognition [arXiv]

@inproceedings{liu2021tam,
  title={TAM: Temporal adaptive module for video recognition},
  author={Liu, Zhaoyang and Wang, Limin and Wu, Wayne and Qian, Chen and Lu, Tong},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={13708--13718},
  year={2021}
}

[NEW!] 2021/07/23 - Our paper has been accepted by ICCV2021. More pretrained models will be released soon for research purpose. Welcom to follow our work!

[NEW!] 2021/06/01 - Our temporal adaptive module has been integrated into MMAction2! We are glad to see our TAM achieved higher accuracy with MMaction2 in several datasets.

[NEW!] 2020/10/10 - We have released the code of TAM for research purpose.

Overview

We release the PyTorch code of the Temporal Adaptive Module.

The overall architecture of TANet: ResNet-Block vs. TA-Block.

Content

Prerequisites
Online Training&Testing Instruction(No Need for Extracting Frames)
Data Preparation
Pretrained Models
Testing
Training

Prerequisites

The code is built with following libraries:

python 3.6 or higher
PyTorch 1.0 or higher
torchvision 0.2 or higher
TensorboardX
tqdm
scikit-learn
opencv-python 4.1 or higher

Online Training&Testing(Kinetics400)

Since there are too many video frames in Kinetics-400 Dataset, it will lead to large disk cost if we extract all the frames in the ~300k videos. So we propose an online method. Through this, we can just input the video and extract the specific frames in the cache instead of storing in disk.

Video Directory Visualization

Video Directory(after decompressing)
  |_ Kinetics400
    |_ train
    |  |_ [part_0]
    |  |  |_ --QUuC4vJs_000084_000095.mp4
    |  |  |_ {youtube_id}_{time_start:0>6d}_{time_end:0>6d}.mp4(as in train.csv file)
    |  |_ [part_1]
    |     |_ ...
    |_ val
    |   |_ part_0
    |      |_ ...
    |   |_ part_1
    |      |_ ...
    |   |_ part_2
    |      |_ ...
    |_ test
    |   |_ part_0
    |      |_ ...
    |   |_ part_1
    |      |_ ...
    |   |_ part_2
    |      |_ ...

Generate Video-wise Label

Firstly,please config the VIDEO_PATH and label_path variables in gen_online_label_kinetics.py.

VIDEO_PATH means Video Directory(after decompressing) which is mentioned above, and label_path refers to the path where .csv files locate and the otuput path of our labels file(.txt).

  cd online_label_tools
  python gen_online_label_kinetics.py

After running the gen_online_label_kinetics.py script, the label_path is shown:

label_path(the last 4 .txt files are generated by the gen_online_label_kinetics.py script)
  |_ train.csv
  |_ val.csv
  |_ test.csv
  |_ k400_val_list.txt
  |_ k400_train_list.txt
  |_ missing_k400_val_list.txt
  |_ missing_k400_train_list.txt

Item in k400_train_list.txt:

  video_path num_frames action_cls

Training

We provided several scripts to train TAM in this repo:

To train on Kinetics from ImageNet pretrained models, you can run online_scripts/train_tam_kinetics_rgb_8f.sh, which contains:

  python -u online_main.py kinetics RGB --arch resnet50 \
  --num_segments 8 --gd 20 --lr 0.01 --lr_steps 50 75 90 --epochs 100 --batch-size 8 \
  -j 8 --dropout 0.5 --consensus_type=avg --root_log ./checkpoint/this_ckpt \
  --root_model ./checkpoint/this_ckpt --eval-freq=1 --npb \
  --self_conv  --dense_sample --wd 0.0001

You can also add '--gpu 0 1 ...' , if you have more than 1 GPU.

Testing

For example, to test the downloaded pretrained models on Kinetics, you can run online_scripts/test_tam_kinetics_rgb_8f.sh. The scripts will test TAM with 8-frame setting:

# test TAM on Kinetics-400
python -u online_test_models.py kinetics \
--weights=./checkpoints/kinetics_RGB_resnet50_tam_avg_segment8_e100_dense/ckpt.best.pth.tar \
--test_segments=8 --test_crops=3 \
--full_res --sample dense-10 --batch_size 8

Data Preparation

As following TSN and TSM repos, we provide a series of tools (vidtools) to extracte frames of video.

For convenience, the processing of video data can be summarized as follows:

Extract frames from videos.

Firstly, you need clone vidtools:

git clone https://github.com/liu-zhy/vidtools.git & cd vidtools

Extract frames by running:

python extract_frames.py VIDEOS_PATH/ \
-o DATASETS_PATH/frames/ \
-j 16 --out_ext png

We suggest you use --out_ext jpg with limited disk storage.

Generate the annotation.

The annotation usually includes train.txt, val.txt and test.txt (optional). The format of *.txt file is like:

frames/video_1 num_frames label_1
frames/video_2 num_frames label_2
frames/video_3 num_frames label_3
...
frames/video_N num_frames label_N

The pre-processed dataset is organized with the following structure:

datasets
  |_ Kinetics400
    |_ frames
    |  |_ [video_0]
    |  |  |_ img_00001.png
    |  |  |_ img_00001.png
    |  |  |_ ...
    |  |_ [video_1]
    |     |_ img_00001.png
    |     |_ img_00002.png
    |     |_ ...
    |_ annotations
       |_ train.txt
       |_ val.txt
       |_ test.txt (optional)

Configure the dataset in ops/dataset_configs.py.

Model ZOO

Here we provide some off-the-shelf pretrained models. The accuracy might vary a little bit compared to the paper, since the raw video of Kinetics downloaded by users may have some differences.

Models	Datasets	Resolution	Frames * Crops * Clips	Top-1	Top-5	Checkpoints
TAM-R50	Kinetics-400	256 * 256	8 * 3 * 10	76.1%	92.3%	ckpt
TAM-R50	Kinetics-400	256 * 256	16 * 3 * 10	76.9%	92.9%	ckpt
TAM-R50	Sth-Sth v1	224 * 224	8 * 1 * 1	46.5%	75.8%	ckpt
TAM-R50	Sth-Sth v1	224 * 224	16 * 1 * 1	47.6%	77.7%	ckpt
TAM-R50	Sth-Sth v2	256 * 256	8 * 3 * 2	62.7%	88.0%	ckpt
TAM-R50	Sth-Sth v2	256 * 256	16 * 3 * 2	64.6%	89.5%	ckpt

After downloading the checkpoints and putting them into the target path, you can test the TAM with these pretrained weights.

Testing

For example, to test the downloaded pretrained models on Kinetics, you can run scripts/test_tam_kinetics_rgb_8f.sh. The scripts will test TAM with 8-frame setting:

# test TAM on Kinetics-400
python -u test_models.py kinetics \
--weights=./checkpoints/kinetics_RGB_resnet50_tam_avg_segment8_e100_dense/ckpt.best.pth.tar \
--test_segments=8 --test_crops=3 \
--full_res --sample dense-10 --batch_size 8

We should notice that --sample can determine the sampling strategy in the testing. Specifically, --sample uniform-N denotes the model takes N clips uniformly sampled from video as inputs, and --sample dense-N denotes the model takes N clips densely sampled from video as inputs.

You also can test TAM on Something-Something V2 by running scripts/test_tam_somethingv2_rgb_8f.sh:

# test TAM on Something-Something V2
python -u test_models.py somethingv2 \
--weights=./checkpoints/something_RGB_resnet50_tam_avg_segment8_e50/ckpt.best.pth.tar \
--test_segments=8 --test_crops=3 \
--full_res --sample uniform-2 --batch_size 32

Training

We provided several scripts to train TAM in this repo:

To train on Kinetics from ImageNet pretrained models, you can run scripts/train_tam_kinetics_rgb_8f.sh, which contains:

  python -u main.py kinetics RGB --arch resnet50 \
  --num_segments 8 --gd 20 --lr 0.01 --lr_steps 50 75 90 --epochs 100 --batch-size 8 \
  -j 8 --dropout 0.5 --consensus_type=avg --root_log ./checkpoint/this_ckpt \
  --root_model ./checkpoint/this_ckpt --eval-freq=1 --npb \
  --self_conv  --dense_sample --wd 0.0001

After training, you should get a new checkpoint as downloaded above.

To train on Something-Something dataset (V1 & V2), you can run following commands:

# train TAM on Something-Something V1
bash scripts/train_tam_something_rgb_8f.sh

# train TAM on Something-Something V2
bash scripts/train_tam_somethingv2_rgb_8f.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TAM: Temporal Adaptive Module for Video Recognition [arXiv]

Overview

Content

Prerequisites

Online Training&Testing(Kinetics400)

Video Directory Visualization

Generate Video-wise Label

Training

Testing

Data Preparation

Model ZOO

Testing

Training

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
archs		archs
checkpoints		checkpoints
online_label_tools		online_label_tools
online_scripts		online_scripts
ops		ops
scripts		scripts
tools		tools
visualization		visualization
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
online_main.py		online_main.py
online_test_models.py		online_test_models.py
opts.py		opts.py
test_models.py		test_models.py

License

Coobiw/temporal-adaptive-module

Folders and files

Latest commit

History

Repository files navigation

TAM: Temporal Adaptive Module for Video Recognition [arXiv]

Overview

Content

Prerequisites

Online Training&Testing(Kinetics400)

Video Directory Visualization

Generate Video-wise Label

Training

Testing

Data Preparation

Model ZOO

Testing

Training

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages