Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Tamura, Masato
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2404.09964
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913315768238080
author	Tamura, Masato
author_facet	Tamura, Masato
contents	Social group activity recognition is a challenging task extended from group activity recognition, where social groups must be recognized with their activities and group members. Existing methods tackle this task by leveraging region features of individuals following existing group activity recognition methods. However, the effectiveness of region features is susceptible to person localization and variable semantics of individual actions. To overcome these issues, we propose leveraging attention modules in transformers to generate social group features. In this method, multiple embeddings are used to aggregate features for a social group, each of which is assigned to a group member without duplication. Due to this non-duplicated assignment, the number of embeddings must be significant to avoid missing group members and thus renders attention in transformers ineffective. To find optimal attention designs with a large number of embeddings, we explore several design choices of queries for feature aggregation and self-attention modules in transformer decoders. Extensive experimental results show that the proposed method achieves state-of-the-art performance and verify that the proposed attention designs are highly effective on social group activity recognition.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_09964
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Design and Analysis of Efficient Attention in Transformers for Social Group Activity Recognition Tamura, Masato Computer Vision and Pattern Recognition Machine Learning Social group activity recognition is a challenging task extended from group activity recognition, where social groups must be recognized with their activities and group members. Existing methods tackle this task by leveraging region features of individuals following existing group activity recognition methods. However, the effectiveness of region features is susceptible to person localization and variable semantics of individual actions. To overcome these issues, we propose leveraging attention modules in transformers to generate social group features. In this method, multiple embeddings are used to aggregate features for a social group, each of which is assigned to a group member without duplication. Due to this non-duplicated assignment, the number of embeddings must be significant to avoid missing group members and thus renders attention in transformers ineffective. To find optimal attention designs with a large number of embeddings, we explore several design choices of queries for feature aggregation and self-attention modules in transformer decoders. Extensive experimental results show that the proposed method achieves state-of-the-art performance and verify that the proposed attention designs are highly effective on social group activity recognition.
title	Design and Analysis of Efficient Attention in Transformers for Social Group Activity Recognition
topic	Computer Vision and Pattern Recognition Machine Learning
url	https://arxiv.org/abs/2404.09964

Similar Items