Skip to content

A causal perspective for compositional action recognition, providing a counterfactual debiasing inference implementation to matigate the object appearance bias in videos.

License

Notifications You must be signed in to change notification settings

pengzhansun/Counterfactual-Debiasing-Network

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Counterfactual Debiasing Inference for Compositional Action Recognition

This code base is the pytorch implementation of the paper:
Counterfactual Debiasing Inference for Compositional Action Recognition, ACM Multimedia, 2021
*Pengzhan Sun, *Bo Wu, Xunsong Li, Wen Li, Lixin Duan, Chuang Gan

Introduction

The Counterfactual Debiasing Network (CDN) is designed to remove the bad object appearance bias while keep the good object appearance cue for action recognition. Recent action recognition models may tend to rely on object appearance as a shortcut and thus fail to sufficiently learn the real action knowledge. On the one hand, object appearance is the bias which cheats the model to make the wrong prediction because of different action classes it co-appears between the training stage and test stage. On the other hand, the object appearance is a meaningful cue which can help the model to learn the knowledge of action.

Task Setting

There are two disjoint action sets {1, 2} and two disjoint object sets {A, B}. For the compositional action recognition task, the training set of the model is {1A + 2B}, and the verification set is {1B + 2A}. Under this challenging setting, the model needs to be able to recognize new combinations of actions and objects. In this problem setting, there are 174 action categories with 54,919 training and 57,876 validation instances. More details can be found in Something-Else.

Method

We empower models the ability of counterfactual analysis so that a more accurate classification result can be gained by comparing factual inference outcome and counterfactual inference outcome.

  • We observe that prior knowledge learned from appearance information is mixed with the spurious correlation between action and instance appearance, which badly inhibits the model’s ability of action learning.
  • We remove the pure appearance effect from total effect by counterfactual debiasing inference on our novel framework CDN proposed for compositional action recognition.

  • We achieve state-of-the-art performance for compositional action recognition on the Something-Else dataset.

Requirements

pip install -r requirements.txt

Dataset

Download Something-Something Dataset and Something-Else Annotation from Something-Else repo (Joaana et al., 2020). Note that we also provide the annotation per video for users with limited computing resources by spliting Something-Else Annotation mentioned above.

Getting Started

To train, test or conduct counterfactual debiasing inference, please run these scripts.

Checkpoints

Download our models reported on the paper.

Citation

If you use this code repository in your research, please cite this project.

@inproceedings{counterfactual2021,
  title={Counterfactual Debiasing Inference for Compositional Action Recognition},
  author={Sun, Pengzhan and Wu, Bo and Li, Xunsong and Li, Wen and Duan, Lixin and Gan, Chuang},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  year={2021}
}

Acknowledgments

This repository is implemented as a fork of Something-Else. We used parts of code from following repositories:

https://github.com/joaanna/something_else

https://github.com/ruiyan1995/Interactive_Fusion_for_CAR

Contact: pengzhansun6@gmail.com

About

A causal perspective for compositional action recognition, providing a counterfactual debiasing inference implementation to matigate the object appearance bias in videos.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published