:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Moodley, Perusha, Kaushik, Pramod, Thambi, Dhillu, Trovinger, Mark, Paruchuri, Praveen, Hong, Xia, Rosman, Benjamin
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2407.01310
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MV-GMN: State Space Model for Multi-View Action Recognition
by: Lin, Yuhui, et al.
Published: (2025)

MALT: Multi-scale Action Learning Transformer for Online Action Detection
by: Yang, Zhipeng, et al.
Published: (2024)

Action Selection Learning for Multi-label Multi-view Action Recognition
by: Nguyen, Trung Thanh, et al.
Published: (2024)

ViTALS: Vision Transformer for Action Localization in Surgical Nephrectomy
by: Chandra, Soumyadeep, et al.
Published: (2024)

S3T-Former: A Purely Spike-Driven State-Space Topology Transformer for Skeleton Action Recognition
by: Zheng, Naichuan, et al.
Published: (2026)

Boundary Discretization and Reliable Classification Network for Temporal Action Detection
by: Fang, Zhenying, et al.
Published: (2023)

MultiFuser: Multimodal Fusion Transformer for Enhanced Driver Action Recognition
by: Wang, Ruoyu, et al.
Published: (2024)

A Real-Time Human Action Recognition Model for Assisted Living
by: Wang, Yixuan, et al.
Published: (2025)

MultiTSF: Transformer-based Sensor Fusion for Human-Centric Multi-view and Multi-modal Action Recognition
by: Nguyen, Trung Thanh, et al.
Published: (2025)

MS-CLR: Multi-Skeleton Contrastive Learning for Human Action Recognition
by: Kiray, Mert, et al.
Published: (2025)

A Universal Action Space for General Behavior Analysis
by: Chang, Hung-Shuo, et al.
Published: (2026)

Multi-Granularity Hand Action Detection
by: Zhe, Ting, et al.
Published: (2023)

MVAFormer: RGB-based Multi-View Spatio-Temporal Action Recognition with Transformer
by: Yamane, Taiga, et al.
Published: (2025)

Hierarchical Multi-Stage Transformer Architecture for Context-Aware Temporal Action Localization
by: Ullah, Hayat, et al.
Published: (2025)

Learning Action Hierarchies via Hybrid Geometric Diffusion
by: Kaushik, Arjun Ramesh, et al.
Published: (2026)

DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer
by: Kim, Ho-Joong, et al.
Published: (2025)

Multi-level and Multi-modal Action Anticipation
by: Kim, Seulgi, et al.
Published: (2025)

Multi-Stage Boundary-Aware Transformer Network for Action Segmentation in Untrimmed Surgical Videos
by: Shuvo, Rezowan, et al.
Published: (2025)

MGCA-Net: Multi-Grained Category-Aware Network for Open-Vocabulary Temporal Action Localization
by: Fang, Zhenying, et al.
Published: (2025)

MultiModal Action Conditioned Video Generation
by: Li, Yichen, et al.
Published: (2025)

Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition
by: Biswas, Shristi Das, et al.
Published: (2025)

Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
by: Liang, Zhixuan, et al.
Published: (2025)

MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion
by: Nguyen, Trung Thanh, et al.
Published: (2025)

PointACT: Vision-Language-Action Models with Multi-Scale Point-Action Interaction
by: Chen, Shizhe, et al.
Published: (2026)

Friends Across Time: Multi-Scale Action Segmentation Transformer for Surgical Phase Recognition
by: Zhang, Bokai, et al.
Published: (2024)

Multi-Stage Contrastive Regression for Action Quality Assessment
by: An, Qi, et al.
Published: (2024)

Dual DETRs for Multi-Label Temporal Action Detection
by: Zhu, Yuhan, et al.
Published: (2024)

MMAD: Multi-label Micro-Action Detection in Videos
by: Li, Kun, et al.
Published: (2024)

Multi-task Learning For Joint Action and Gesture Recognition
by: Spathis, Konstantinos, et al.
Published: (2025)

Advancing Compressed Video Action Recognition through Progressive Knowledge Distillation
by: Soufleri, Efstathia, et al.
Published: (2024)

GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos
by: Souček, Tomáš, et al.
Published: (2023)

One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features
by: Nguyen, Trung Thanh, et al.
Published: (2024)

Grounding Actions in Camera Space: Observation-Centric Vision-Language-Action Policy
by: Zhang, Tianyi, et al.
Published: (2025)

SigFormer: Sparse Signal-Guided Transformer for Multi-Modal Human Action Segmentation
by: Liu, Qi, et al.
Published: (2023)

HiMemFormer: Hierarchical Memory-Aware Transformer for Multi-Agent Action Anticipation
by: Wang, Zirui, et al.
Published: (2024)

Interaction-via-Actions: Cattle Interaction Detection with Joint Learning of Action-Interaction Latent Space
by: Nakagawa, Ren, et al.
Published: (2025)

An Effective-Efficient Approach for Dense Multi-Label Action Detection
by: Sardari, Faegheh, et al.
Published: (2024)

Multi-Level LVLM Guidance for Untrimmed Video Action Recognition
by: Peng, Liyang, et al.
Published: (2025)

MAMMA: Markerless & Automatic Multi-Person Motion Action Capture
by: Cuevas-Velasquez, Hanz, et al.
Published: (2025)

ActionParty: Multi-Subject Action Binding in Generative Video Games
by: Pondaven, Alexander, et al.
Published: (2026)