Saved in:
Bibliographic Details
Main Authors: Chen, Shimin, Li, Wei, Chen, Chen, Gu, Jianyang, Chu, Jiaming, Tao, Xunqiang, Guo, Yandong
Format: Preprint
Published: 2022
Subjects:
Online Access:https://arxiv.org/abs/2204.02688
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910717106454528
author Chen, Shimin
Li, Wei
Chen, Chen
Gu, Jianyang
Chu, Jiaming
Tao, Xunqiang
Guo, Yandong
author_facet Chen, Shimin
Li, Wei
Chen, Chen
Gu, Jianyang
Chu, Jiaming
Tao, Xunqiang
Guo, Yandong
contents In this paper, we introduce a novel large-scale video dataset dubbed MM-SEAL for multi-person multi-grained spatio-temporal action localization among human daily life. We are the first to propose a new benchmark for multi-person spatio-temporal complex activity localization, where complex semantic and long duration bring new challenges to localization tasks. We observe that limited atomic actions can be combined into many complex activities. MM-SEAL provides both atomic action and complex activity annotations, producing 111.7k atomic actions spanning 172 action categories and 17.7k complex activities spanning 200 activity categories. We explore the relationship between atomic actions and complex activities, finding that atomic action features can improve the complex activity localization performance. Also, we propose a new network which generates temporal proposals and labels simultaneously, termed Faster-TAD. Finally, our evaluations show that visual features pretrained on MM-SEAL can improve the performance on other action localization benchmarks. We will release the dataset and the project code upon publication of the paper.
format Preprint
id arxiv_https___arxiv_org_abs_2204_02688
institution arXiv
publishDate 2022
record_format arxiv
spellingShingle MM-SEAL: A Large-scale Video Dataset of Multi-person Multi-grained Spatio-temporally Action Localization
Chen, Shimin
Li, Wei
Chen, Chen
Gu, Jianyang
Chu, Jiaming
Tao, Xunqiang
Guo, Yandong
Computer Vision and Pattern Recognition
In this paper, we introduce a novel large-scale video dataset dubbed MM-SEAL for multi-person multi-grained spatio-temporal action localization among human daily life. We are the first to propose a new benchmark for multi-person spatio-temporal complex activity localization, where complex semantic and long duration bring new challenges to localization tasks. We observe that limited atomic actions can be combined into many complex activities. MM-SEAL provides both atomic action and complex activity annotations, producing 111.7k atomic actions spanning 172 action categories and 17.7k complex activities spanning 200 activity categories. We explore the relationship between atomic actions and complex activities, finding that atomic action features can improve the complex activity localization performance. Also, we propose a new network which generates temporal proposals and labels simultaneously, termed Faster-TAD. Finally, our evaluations show that visual features pretrained on MM-SEAL can improve the performance on other action localization benchmarks. We will release the dataset and the project code upon publication of the paper.
title MM-SEAL: A Large-scale Video Dataset of Multi-person Multi-grained Spatio-temporally Action Localization
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2204.02688