Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Shimin, Li, Wei, Chen, Chen, Gu, Jianyang, Chu, Jiaming, Tao, Xunqiang, Guo, Yandong
Format:	Preprint
Published:	2022
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2204.02688
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910717106454528
author	Chen, Shimin Li, Wei Chen, Chen Gu, Jianyang Chu, Jiaming Tao, Xunqiang Guo, Yandong
author_facet	Chen, Shimin Li, Wei Chen, Chen Gu, Jianyang Chu, Jiaming Tao, Xunqiang Guo, Yandong
contents	In this paper, we introduce a novel large-scale video dataset dubbed MM-SEAL for multi-person multi-grained spatio-temporal action localization among human daily life. We are the first to propose a new benchmark for multi-person spatio-temporal complex activity localization, where complex semantic and long duration bring new challenges to localization tasks. We observe that limited atomic actions can be combined into many complex activities. MM-SEAL provides both atomic action and complex activity annotations, producing 111.7k atomic actions spanning 172 action categories and 17.7k complex activities spanning 200 activity categories. We explore the relationship between atomic actions and complex activities, finding that atomic action features can improve the complex activity localization performance. Also, we propose a new network which generates temporal proposals and labels simultaneously, termed Faster-TAD. Finally, our evaluations show that visual features pretrained on MM-SEAL can improve the performance on other action localization benchmarks. We will release the dataset and the project code upon publication of the paper.
format	Preprint
id	arxiv_https___arxiv_org_abs_2204_02688
institution	arXiv
publishDate	2022
record_format	arxiv
spellingShingle	MM-SEAL: A Large-scale Video Dataset of Multi-person Multi-grained Spatio-temporally Action Localization Chen, Shimin Li, Wei Chen, Chen Gu, Jianyang Chu, Jiaming Tao, Xunqiang Guo, Yandong Computer Vision and Pattern Recognition In this paper, we introduce a novel large-scale video dataset dubbed MM-SEAL for multi-person multi-grained spatio-temporal action localization among human daily life. We are the first to propose a new benchmark for multi-person spatio-temporal complex activity localization, where complex semantic and long duration bring new challenges to localization tasks. We observe that limited atomic actions can be combined into many complex activities. MM-SEAL provides both atomic action and complex activity annotations, producing 111.7k atomic actions spanning 172 action categories and 17.7k complex activities spanning 200 activity categories. We explore the relationship between atomic actions and complex activities, finding that atomic action features can improve the complex activity localization performance. Also, we propose a new network which generates temporal proposals and labels simultaneously, termed Faster-TAD. Finally, our evaluations show that visual features pretrained on MM-SEAL can improve the performance on other action localization benchmarks. We will release the dataset and the project code upon publication of the paper.
title	MM-SEAL: A Large-scale Video Dataset of Multi-person Multi-grained Spatio-temporally Action Localization
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2204.02688

Similar Items