Saved in:
Bibliographic Details
Main Authors: Xu, Hao, Wei, Xinyu, Wells, Sam, Aryal, Sunil
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.07381
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914179248553984
author Xu, Hao
Wei, Xinyu
Wells, Sam
Aryal, Sunil
author_facet Xu, Hao
Wei, Xinyu
Wells, Sam
Aryal, Sunil
contents Precise Event Spotting (PES) in sports videos requires frame-level recognition of fine-grained actions from single-camera footage. Existing PES models typically incorporate lightweight temporal modules such as the Gate Shift Module (GSM) or the Gate Shift Fuse to enrich 2D CNN feature extractors with temporal context. However, these modules are limited in both temporal receptive field and spatial adaptability. We propose Multi-Focus Temporal Shifting Module (MFS) that enhances GSM with multi-scale temporal shifts and Group Focus Module, enabling efficient modeling of both short and long-term dependencies while focusing on salient regions. MFS is a lightweight, plug-and-play module that integrates seamlessly with diverse 2D backbones. To further advance the field, we introduce the Table Tennis Australia dataset, the first PES benchmark for table tennis containing over 4,800 precisely annotated events. Extensive experiments across five PES benchmarks demonstrate that MFS consistently improves performance with minimal overhead, achieving leading results among lightweight methods (+4.09 mAP, 45 GFLOPs).
format Preprint
id arxiv_https___arxiv_org_abs_2507_07381
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Multi-Focus Temporal Shifting for Precise Event Spotting in Sports Videos
Xu, Hao
Wei, Xinyu
Wells, Sam
Aryal, Sunil
Computer Vision and Pattern Recognition
Precise Event Spotting (PES) in sports videos requires frame-level recognition of fine-grained actions from single-camera footage. Existing PES models typically incorporate lightweight temporal modules such as the Gate Shift Module (GSM) or the Gate Shift Fuse to enrich 2D CNN feature extractors with temporal context. However, these modules are limited in both temporal receptive field and spatial adaptability. We propose Multi-Focus Temporal Shifting Module (MFS) that enhances GSM with multi-scale temporal shifts and Group Focus Module, enabling efficient modeling of both short and long-term dependencies while focusing on salient regions. MFS is a lightweight, plug-and-play module that integrates seamlessly with diverse 2D backbones. To further advance the field, we introduce the Table Tennis Australia dataset, the first PES benchmark for table tennis containing over 4,800 precisely annotated events. Extensive experiments across five PES benchmarks demonstrate that MFS consistently improves performance with minimal overhead, achieving leading results among lightweight methods (+4.09 mAP, 45 GFLOPs).
title Multi-Focus Temporal Shifting for Precise Event Spotting in Sports Videos
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2507.07381