Saved in:
Bibliographic Details
Main Authors: Jin, Zhangyu, Feng, Andrew, Chemburkar, Ankur, De Melo, Celso M.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.08933
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912553994551296
author Jin, Zhangyu
Feng, Andrew
Chemburkar, Ankur
De Melo, Celso M.
author_facet Jin, Zhangyu
Feng, Andrew
Chemburkar, Ankur
De Melo, Celso M.
contents We present PromptGAR, a novel framework for Group Activity Recognition (GAR) that offering both input flexibility and high recognition accuracy. The existing approaches suffer from limited real-world applicability due to their reliance on full prompt annotations, fixed number of frames and instances, and the lack of actor consistency. To bridge the gap, we proposed PromptGAR, which is the first GAR model to provide input flexibility across prompts, frames, and instances without the need for retraining. We leverage diverse visual prompts, like bounding boxes, skeletal keypoints, and instance identities, by unifying them as point prompts. A recognition decoder then cross-updates class and prompt tokens for enhanced performance. To ensure actor consistency for extended activity durations, we also introduce a relative instance attention mechanism that directly encodes instance identities. Comprehensive evaluations demonstrate that PromptGAR achieves competitive performances both on full prompts and partial prompt inputs, establishing its effectiveness on input flexibility and generalization ability for real-world applications.
format Preprint
id arxiv_https___arxiv_org_abs_2503_08933
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle PromptGAR: Flexible Promptive Group Activity Recognition
Jin, Zhangyu
Feng, Andrew
Chemburkar, Ankur
De Melo, Celso M.
Computer Vision and Pattern Recognition
We present PromptGAR, a novel framework for Group Activity Recognition (GAR) that offering both input flexibility and high recognition accuracy. The existing approaches suffer from limited real-world applicability due to their reliance on full prompt annotations, fixed number of frames and instances, and the lack of actor consistency. To bridge the gap, we proposed PromptGAR, which is the first GAR model to provide input flexibility across prompts, frames, and instances without the need for retraining. We leverage diverse visual prompts, like bounding boxes, skeletal keypoints, and instance identities, by unifying them as point prompts. A recognition decoder then cross-updates class and prompt tokens for enhanced performance. To ensure actor consistency for extended activity durations, we also introduce a relative instance attention mechanism that directly encodes instance identities. Comprehensive evaluations demonstrate that PromptGAR achieves competitive performances both on full prompts and partial prompt inputs, establishing its effectiveness on input flexibility and generalization ability for real-world applications.
title PromptGAR: Flexible Promptive Group Activity Recognition
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2503.08933