Saved in:
| Main Author: | John, Shahla |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.22421 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Temporal Alignment-Free Video Matching for Few-shot Action Recognition
by: Lee, SuBeen, et al.
Published: (2025)
by: Lee, SuBeen, et al.
Published: (2025)
Tracking the Truth: Object-Centric Spatio-Temporal Monitoring for Video Large Language Models
by: Cao, Tri, et al.
Published: (2026)
by: Cao, Tri, et al.
Published: (2026)
PolypSegTrack: Unified Foundation Model for Colonoscopy Video Analysis
by: Choudhuri, Anwesa, et al.
Published: (2025)
by: Choudhuri, Anwesa, et al.
Published: (2025)
One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton Matching
by: Yang, Siyuan, et al.
Published: (2023)
by: Yang, Siyuan, et al.
Published: (2023)
EITNet: An IoT-Enhanced Framework for Real-Time Basketball Action Recognition
by: Liu, Jingyu, et al.
Published: (2024)
by: Liu, Jingyu, et al.
Published: (2024)
UniSOT: A Unified Framework for Multi-Modality Single Object Tracking
by: Ma, Yinchao, et al.
Published: (2025)
by: Ma, Yinchao, et al.
Published: (2025)
RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models
by: Lin, Yijing, et al.
Published: (2025)
by: Lin, Yijing, et al.
Published: (2025)
LSTC-MDA: A Unified Framework for Long-Short Term Temporal Convolution and Mixed Data Augmentation in Skeleton-Based Action Recognition
by: Ding, Feng, et al.
Published: (2025)
by: Ding, Feng, et al.
Published: (2025)
Temporal vs. Spatial: Comparing DINOv3 and V-JEPA2 Feature Representations for Video Action Analysis
by: Kodathala, Sai Varun, et al.
Published: (2025)
by: Kodathala, Sai Varun, et al.
Published: (2025)
Temporal and Spatial Feature Fusion Framework for Dynamic Micro Expression Recognition
by: Liu, Feng, et al.
Published: (2025)
by: Liu, Feng, et al.
Published: (2025)
UASTrack: A Unified Adaptive Selection Framework with Modality-Customization in Single Object Tracking
by: Wang, He, et al.
Published: (2025)
by: Wang, He, et al.
Published: (2025)
CaptionFormer: Unified Segmentation, Tracking, and Captioning for Spatio-Temporal Objects
by: Fiastre, Gabriel, et al.
Published: (2025)
by: Fiastre, Gabriel, et al.
Published: (2025)
Real-Time Manipulation Action Recognition with a Factorized Graph Sequence Encoder
by: Erdogan, Enes, et al.
Published: (2025)
by: Erdogan, Enes, et al.
Published: (2025)
Exploring Explainability in Video Action Recognition
by: Saha, Avinab, et al.
Published: (2024)
by: Saha, Avinab, et al.
Published: (2024)
RealWonder: Real-Time Physical Action-Conditioned Video Generation
by: Liu, Wei, et al.
Published: (2026)
by: Liu, Wei, et al.
Published: (2026)
Fire on Motion: Optimizing Video Pass-bands for Efficient Spiking Action Recognition
by: Ye, Shuhan, et al.
Published: (2026)
by: Ye, Shuhan, et al.
Published: (2026)
Real-Time Human Action Recognition on Embedded Platforms
by: Wang, Ruiqi, et al.
Published: (2024)
by: Wang, Ruiqi, et al.
Published: (2024)
Efficient Event-Based Object Detection: A Hybrid Neural Network with Spatial and Temporal Attention
by: Ahmed, Soikat Hasan, et al.
Published: (2024)
by: Ahmed, Soikat Hasan, et al.
Published: (2024)
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
by: Yuan, Yuqian, et al.
Published: (2024)
by: Yuan, Yuqian, et al.
Published: (2024)
Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning
by: Xu, Huilin, et al.
Published: (2025)
by: Xu, Huilin, et al.
Published: (2025)
A Survey on Backbones for Deep Video Action Recognition
by: Tang, Zixuan, et al.
Published: (2024)
by: Tang, Zixuan, et al.
Published: (2024)
YOLO26: An Analysis of NMS-Free End to End Framework for Real-Time Object Detection
by: Chakrabarty, Sudip
Published: (2026)
by: Chakrabarty, Sudip
Published: (2026)
STAA: Spatio-Temporal Attention Attribution for Real-Time Interpreting Transformer-based Video Models
by: Wang, Zerui, et al.
Published: (2024)
by: Wang, Zerui, et al.
Published: (2024)
Towards Efficient Real-Time Video Motion Transfer via Generative Time Series Modeling
by: Haque, Tasmiah, et al.
Published: (2025)
by: Haque, Tasmiah, et al.
Published: (2025)
Improving Skeleton-based Action Recognition with Interactive Object Information
by: Wen, Hao, et al.
Published: (2025)
by: Wen, Hao, et al.
Published: (2025)
Skeleton-Based Action Recognition with Spatial-Structural Graph Convolution
by: Wang, Jingyao, et al.
Published: (2024)
by: Wang, Jingyao, et al.
Published: (2024)
SV3.3B: A Sports Video Understanding Model for Action Recognition
by: Kodathala, Sai Varun, et al.
Published: (2025)
by: Kodathala, Sai Varun, et al.
Published: (2025)
EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action Recognition
by: Abdelkawy, Ahmed, et al.
Published: (2024)
by: Abdelkawy, Ahmed, et al.
Published: (2024)
Exploring Ordinal Bias in Action Recognition for Instructional Videos
by: Kim, Joochan, et al.
Published: (2025)
by: Kim, Joochan, et al.
Published: (2025)
SkateboardAI: The Coolest Video Action Recognition for Skateboarding
by: Chen, Hanxiao
Published: (2023)
by: Chen, Hanxiao
Published: (2023)
Flatten: Video Action Recognition is an Image Classification task
by: Chen, Junlin, et al.
Published: (2024)
by: Chen, Junlin, et al.
Published: (2024)
Efficient Egocentric Action Recognition with Multimodal Data
by: Calzavara, Marco, et al.
Published: (2025)
by: Calzavara, Marco, et al.
Published: (2025)
7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting
by: Gao, Zhongpai, et al.
Published: (2025)
by: Gao, Zhongpai, et al.
Published: (2025)
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs
by: Zhang, Jianrui, et al.
Published: (2026)
by: Zhang, Jianrui, et al.
Published: (2026)
Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervision
by: Zhang, Chenshuang, et al.
Published: (2025)
by: Zhang, Chenshuang, et al.
Published: (2025)
Action Recognition in Real-World Ambient Assisted Living Environment
by: Zakka, Vincent Gbouna, et al.
Published: (2025)
by: Zakka, Vincent Gbouna, et al.
Published: (2025)
Highly Efficient and Unsupervised Framework for Moving Object Detection in Satellite Videos
by: Xiao, C., et al.
Published: (2024)
by: Xiao, C., et al.
Published: (2024)
Rethinking Video Human-Object Interaction: Set Prediction over Time for Unified Detection and Anticipation
by: Luo, Yuanhao, et al.
Published: (2026)
by: Luo, Yuanhao, et al.
Published: (2026)
ATSTrack: Enhancing Visual-Language Tracking by Aligning Temporal and Spatial Scales
by: Zhen, Yihao, et al.
Published: (2025)
by: Zhen, Yihao, et al.
Published: (2025)
Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics
by: Tse, Tze Ho Elden, et al.
Published: (2025)
by: Tse, Tze Ho Elden, et al.
Published: (2025)
Similar Items
-
Temporal Alignment-Free Video Matching for Few-shot Action Recognition
by: Lee, SuBeen, et al.
Published: (2025) -
Tracking the Truth: Object-Centric Spatio-Temporal Monitoring for Video Large Language Models
by: Cao, Tri, et al.
Published: (2026) -
PolypSegTrack: Unified Foundation Model for Colonoscopy Video Analysis
by: Choudhuri, Anwesa, et al.
Published: (2025) -
One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton Matching
by: Yang, Siyuan, et al.
Published: (2023) -
EITNet: An IoT-Enhanced Framework for Real-Time Basketball Action Recognition
by: Liu, Jingyu, et al.
Published: (2024)