Saved in:
Bibliographic Details
Main Author: Tamura, Masato
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2404.09964
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913315768238080
author Tamura, Masato
author_facet Tamura, Masato
contents Social group activity recognition is a challenging task extended from group activity recognition, where social groups must be recognized with their activities and group members. Existing methods tackle this task by leveraging region features of individuals following existing group activity recognition methods. However, the effectiveness of region features is susceptible to person localization and variable semantics of individual actions. To overcome these issues, we propose leveraging attention modules in transformers to generate social group features. In this method, multiple embeddings are used to aggregate features for a social group, each of which is assigned to a group member without duplication. Due to this non-duplicated assignment, the number of embeddings must be significant to avoid missing group members and thus renders attention in transformers ineffective. To find optimal attention designs with a large number of embeddings, we explore several design choices of queries for feature aggregation and self-attention modules in transformer decoders. Extensive experimental results show that the proposed method achieves state-of-the-art performance and verify that the proposed attention designs are highly effective on social group activity recognition.
format Preprint
id arxiv_https___arxiv_org_abs_2404_09964
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Design and Analysis of Efficient Attention in Transformers for Social Group Activity Recognition
Tamura, Masato
Computer Vision and Pattern Recognition
Machine Learning
Social group activity recognition is a challenging task extended from group activity recognition, where social groups must be recognized with their activities and group members. Existing methods tackle this task by leveraging region features of individuals following existing group activity recognition methods. However, the effectiveness of region features is susceptible to person localization and variable semantics of individual actions. To overcome these issues, we propose leveraging attention modules in transformers to generate social group features. In this method, multiple embeddings are used to aggregate features for a social group, each of which is assigned to a group member without duplication. Due to this non-duplicated assignment, the number of embeddings must be significant to avoid missing group members and thus renders attention in transformers ineffective. To find optimal attention designs with a large number of embeddings, we explore several design choices of queries for feature aggregation and self-attention modules in transformer decoders. Extensive experimental results show that the proposed method achieves state-of-the-art performance and verify that the proposed attention designs are highly effective on social group activity recognition.
title Design and Analysis of Efficient Attention in Transformers for Social Group Activity Recognition
topic Computer Vision and Pattern Recognition
Machine Learning
url https://arxiv.org/abs/2404.09964