PyTorch Implementation of our IJCV paper:
EAN: Event Adaptive Network for Enhanced Action Recognition
Yuan Tian, Yichao Yan, Guangtao Zhai, Guodong Guo, and Zhiyong Gao [IJCV] [ArXiv]
Efficiently modeling spatial-temporal information in videos is crucial for action recognition. In this paper, we propose a unified action recognition framework to investigate the dynamic nature of video content by introducing the following designs. First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events. Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer, which yields a sparse paradigm. We call the proposed framework as Event Adaptive Network (EAN) because both key designs are adaptive to the input video content. To exploit the short-term motions within local segments, we propose a novel and efficient Latent Motion Code (LMC) module, further improving the performance of the framework.
Please make sure the following libraries are installed successfully:
- PyTorch >= 1.0
- tqdm
- scikit-learn
Following the common practice, we need to first extract videos into frames for fast data loading. Please refer to TSN repo for the detailed guide of data pre-processing. We have successfully trained on Something-Something-V1 and V2, Kinetics, Diving48 datasets with this codebase. Basically, the processing of video data can be summarized into 3 steps:
-
Extract frames from videos:
-
For Something-Something-V2 dataset, please use data_process/vid2img_sthv2.py
-
For Kinetics dataset, please use data_process/vid2img_kinetics.py
-
For Diving48 dataset, please use data_process/extract_frames_diving48.py
-
-
Generate file lists needed for dataloader:
-
Each line of the list file will contain a tuple of (extracted video frame folder name, video frame number, and video groundtruth class). A list file looks like this:
video_frame_folder 100 10 video_2_frame_folder 150 31 ...
-
Or you can use off-the-shelf tools provided by the repos: data_process/gen_label_xxx.py
-
-
Edit dataset config information in datasets_video.py
Here, we provide the pretrained models of EAN models on Something-Something-V1 datasets. Recognizing actions in this dataset requires strong temporal modeling ability. EAN achieves state-of-the-art performance on these datasets. Notably, our method even surpasses optical flow based methods while with only RGB frames as input.
Model | Backbone | FLOPs | Val Top1 | Val Top5 | Checkpoints |
---|---|---|---|---|---|
EAN8F(RGB+LMC) | ResNet-50 | 37G | 53.4 | 81.1 | [Jianguo Cloud] |
EAN16(RGB+LMC) | 74G | 54.7 | 82.3 | ||
EAN16+8(RGB+LMC) | 111G | 57.2 | 83.9 | ||
EAN2 x (16+8)(RGB+LMC) | 222G | 57.5 | 84.3 |
For example, to test the EAN models on Something-Something-V1, you can first put the downloaded .pth.tar
files into the "pretrained" folder and then run:
# test EAN model with 8frames clip
bash scripts/test/sthv1/RGB_LMC_8F.sh
# test EAN model with 16frames clip
bash scripts/test/sthv1/RGB_LMC_16F.sh
We provided several scripts to train EAN with this repo, please refer to "scripts" folder for more details. For example, to train PAN on Something-Something-V1, you can run:
# train EAN model with 8frames clip
bash scripts/train/sthv1/RGB_LMC_8F.sh
Notice that you should scale up the learning rate with batch size. For example, if you use a batch size of 32 you should set learning rate to 0.005.
This repository is built upon the following baseline implementations for the action recognition task.
Please [★star] this repo and [cite] the following arXiv paper if you feel our EAN useful to your research:
@article{tian2022ean,
title={Ean: event adaptive network for enhanced action recognition},
author={Tian, Yuan and Yan, Yichao and Zhai, Guangtao and Guo, Guodong and Gao, Zhiyong},
journal={International Journal of Computer Vision},
volume={130},
number={10},
pages={2453--2471},
year={2022},
publisher={Springer}
}
For any questions, please feel free to open an issue or contact:
Yuan Tian: ee_tianyuan@sjtu.edu.cn