Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Shuming, Sui, Lin, Zhang, Chen-Lin, Mu, Fangzhou, Zhao, Chen, Ghanem, Bernard
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2407.17792
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911968426721280
author	Liu, Shuming Sui, Lin Zhang, Chen-Lin Mu, Fangzhou Zhao, Chen Ghanem, Bernard
author_facet	Liu, Shuming Sui, Lin Zhang, Chen-Lin Mu, Fangzhou Zhao, Chen Ghanem, Bernard
contents	As a fundamental task in long-form video understanding, temporal action detection (TAD) aims to capture inherent temporal relations in untrimmed videos and identify candidate actions with precise boundaries. Over the years, various networks, including convolutions, graphs, and transformers, have been explored for effective temporal modeling for TAD. However, these modules typically treat past and future information equally, overlooking the crucial fact that changes in action boundaries are essentially causal events. Inspired by this insight, we propose leveraging the temporal causality of actions to enhance TAD representation by restricting the model's access to only past or future context. We introduce CausalTAD, which combines causal attention and causal Mamba to achieve state-of-the-art performance on multiple benchmarks. Notably, with CausalTAD, we ranked 1st in the Action Recognition, Action Detection, and Audio-Based Interaction Detection tracks at the EPIC-Kitchens Challenge 2024, as well as 1st in the Moment Queries track at the Ego4D Challenge 2024. Our code is available at https://github.com/sming256/OpenTAD/.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_17792
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Harnessing Temporal Causality for Advanced Temporal Action Detection Liu, Shuming Sui, Lin Zhang, Chen-Lin Mu, Fangzhou Zhao, Chen Ghanem, Bernard Computer Vision and Pattern Recognition As a fundamental task in long-form video understanding, temporal action detection (TAD) aims to capture inherent temporal relations in untrimmed videos and identify candidate actions with precise boundaries. Over the years, various networks, including convolutions, graphs, and transformers, have been explored for effective temporal modeling for TAD. However, these modules typically treat past and future information equally, overlooking the crucial fact that changes in action boundaries are essentially causal events. Inspired by this insight, we propose leveraging the temporal causality of actions to enhance TAD representation by restricting the model's access to only past or future context. We introduce CausalTAD, which combines causal attention and causal Mamba to achieve state-of-the-art performance on multiple benchmarks. Notably, with CausalTAD, we ranked 1st in the Action Recognition, Action Detection, and Audio-Based Interaction Detection tracks at the EPIC-Kitchens Challenge 2024, as well as 1st in the Moment Queries track at the Ego4D Challenge 2024. Our code is available at https://github.com/sming256/OpenTAD/.
title	Harnessing Temporal Causality for Advanced Temporal Action Detection
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2407.17792

Similar Items