Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2305.06310 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866910703637495808 |
|---|---|
| author | Chappa, Naga VS Raviteja Nguyen, Pha Nelson, Alexander H Seo, Han-Seok Li, Xin Dobbs, Page Daniel Luu, Khoa |
| author_facet | Chappa, Naga VS Raviteja Nguyen, Pha Nelson, Alexander H Seo, Han-Seok Li, Xin Dobbs, Page Daniel Luu, Khoa |
| contents | This paper introduces a novel approach to Social Group Activity Recognition (SoGAR) using Self-supervised Transformers network that can effectively utilize unlabeled video data. To extract spatio-temporal information, we created local and global views with varying frame rates. Our self-supervised objective ensures that features extracted from contrasting views of the same video were consistent across spatio-temporal domains. Our proposed approach is efficient in using transformer-based encoders to alleviate the weakly supervised setting of group activity recognition. By leveraging the benefits of transformer models, our approach can model long-term relationships along spatio-temporal dimensions. Our proposed SoGAR method achieved state-of-the-art results on three group activity recognition benchmarks, namely JRDB-PAR, NBA, and Volleyball datasets, surpassing the current numbers in terms of F1-score, MCA, and MPCA metrics. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2305_06310 |
| institution | arXiv |
| publishDate | 2023 |
| record_format | arxiv |
| spellingShingle | SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition Chappa, Naga VS Raviteja Nguyen, Pha Nelson, Alexander H Seo, Han-Seok Li, Xin Dobbs, Page Daniel Luu, Khoa Computer Vision and Pattern Recognition This paper introduces a novel approach to Social Group Activity Recognition (SoGAR) using Self-supervised Transformers network that can effectively utilize unlabeled video data. To extract spatio-temporal information, we created local and global views with varying frame rates. Our self-supervised objective ensures that features extracted from contrasting views of the same video were consistent across spatio-temporal domains. Our proposed approach is efficient in using transformer-based encoders to alleviate the weakly supervised setting of group activity recognition. By leveraging the benefits of transformer models, our approach can model long-term relationships along spatio-temporal dimensions. Our proposed SoGAR method achieved state-of-the-art results on three group activity recognition benchmarks, namely JRDB-PAR, NBA, and Volleyball datasets, surpassing the current numbers in terms of F1-score, MCA, and MPCA metrics. |
| title | SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2305.06310 |