Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Jin, Zhangyu, Feng, Andrew, Chemburkar, Ankur, De Melo, Celso M.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.08933
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912553994551296
author	Jin, Zhangyu Feng, Andrew Chemburkar, Ankur De Melo, Celso M.
author_facet	Jin, Zhangyu Feng, Andrew Chemburkar, Ankur De Melo, Celso M.
contents	We present PromptGAR, a novel framework for Group Activity Recognition (GAR) that offering both input flexibility and high recognition accuracy. The existing approaches suffer from limited real-world applicability due to their reliance on full prompt annotations, fixed number of frames and instances, and the lack of actor consistency. To bridge the gap, we proposed PromptGAR, which is the first GAR model to provide input flexibility across prompts, frames, and instances without the need for retraining. We leverage diverse visual prompts, like bounding boxes, skeletal keypoints, and instance identities, by unifying them as point prompts. A recognition decoder then cross-updates class and prompt tokens for enhanced performance. To ensure actor consistency for extended activity durations, we also introduce a relative instance attention mechanism that directly encodes instance identities. Comprehensive evaluations demonstrate that PromptGAR achieves competitive performances both on full prompts and partial prompt inputs, establishing its effectiveness on input flexibility and generalization ability for real-world applications.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_08933
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	PromptGAR: Flexible Promptive Group Activity Recognition Jin, Zhangyu Feng, Andrew Chemburkar, Ankur De Melo, Celso M. Computer Vision and Pattern Recognition We present PromptGAR, a novel framework for Group Activity Recognition (GAR) that offering both input flexibility and high recognition accuracy. The existing approaches suffer from limited real-world applicability due to their reliance on full prompt annotations, fixed number of frames and instances, and the lack of actor consistency. To bridge the gap, we proposed PromptGAR, which is the first GAR model to provide input flexibility across prompts, frames, and instances without the need for retraining. We leverage diverse visual prompts, like bounding boxes, skeletal keypoints, and instance identities, by unifying them as point prompts. A recognition decoder then cross-updates class and prompt tokens for enhanced performance. To ensure actor consistency for extended activity durations, we also introduce a relative instance attention mechanism that directly encodes instance identities. Comprehensive evaluations demonstrate that PromptGAR achieves competitive performances both on full prompts and partial prompt inputs, establishing its effectiveness on input flexibility and generalization ability for real-world applications.
title	PromptGAR: Flexible Promptive Group Activity Recognition
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2503.08933

Similar Items