:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sia, Zhen Hao, Rawat, Yogesh Singh
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.03096
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Stable Mean Teacher for Semi-supervised Video Action Detection
by: Kumar, Akash, et al.
Published: (2024)

On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes
by: Modi, Rajat, et al.
Published: (2024)

Semi-supervised Active Learning for Video Action Detection
by: Singh, Ayush, et al.
Published: (2023)

Activity-Biometrics: Person Identification from Daily Activities
by: Azad, Shehreen, et al.
Published: (2024)

Open-Vocabulary Spatio-Temporal Action Detection
by: Wu, Tao, et al.
Published: (2024)

EZ-CLIP: Efficient Zeroshot Video Action Recognition
by: Ahmad, Shahzad, et al.
Published: (2023)

Asynchronous Perception Machine For Efficient Test-Time-Training
by: Modi, Rajat, et al.
Published: (2024)

Scaling Open-Vocabulary Object Detection
by: Minderer, Matthias, et al.
Published: (2023)

MolVision: Molecular Property Prediction with Vision Language Models
by: Adak, Deepan, et al.
Published: (2025)

HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
by: Azad, Shehreen, et al.
Published: (2025)

iSafetyBench: A video-language benchmark for safety in industrial environment
by: Abdullah, Raiyaan, et al.
Published: (2025)

Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding
by: Kumar, Akash, et al.
Published: (2025)

StreamReady: Learning What to Answer and When in Long Streaming Videos
by: Azad, Shehreen, et al.
Published: (2026)

Colors See Colors Ignore: Clothes Changing ReID with Color Disentanglement
by: Pathak, Priyank, et al.
Published: (2025)

DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID
by: Liang, Xin, et al.
Published: (2025)

DisenQ: Disentangling Q-Former for Activity-Biometrics
by: Azad, Shehreen, et al.
Published: (2025)

Coarse Attribute Prediction with Task Agnostic Distillation for Real World Clothes Changing ReID
by: Pathak, Priyank, et al.
Published: (2025)

ProDiG: Progressive Diffusion-Guided Gaussian Splatting for Aerial to Ground Reconstruction
by: Mitra, Sirshapan, et al.
Published: (2026)

Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
by: Bao, Wentao, et al.
Published: (2024)

A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning
by: Kumar, Akash, et al.
Published: (2025)

GaitCrafter: Diffusion Model for Biometric Preserving Gait Synthesis
by: Mitra, Sirshapan, et al.
Published: (2025)

STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding
by: Garg, Aaryan, et al.
Published: (2025)

Navigating Hallucinations for Reasoning of Unintentional Activities
by: Grover, Shresth, et al.
Published: (2024)

Advancing Automatic Photovoltaic Defect Detection using Semi-Supervised Semantic Segmentation of Electroluminescence Images
by: Jha, Abhishek, et al.
Published: (2024)

Open Vocabulary Monocular 3D Object Detection
by: Yao, Jin, et al.
Published: (2024)

DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition
by: Cheng, Haozhe, et al.
Published: (2024)

Towards Open-Vocabulary Industrial Defect Understanding with a Large-Scale Multimodal Dataset
by: Ni, TsaiChing, et al.
Published: (2025)

One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features
by: Nguyen, Trung Thanh, et al.
Published: (2024)

Denoise and Align: Diffusion-Driven Foreground Knowledge Prompting for Open-Vocabulary Temporal Action Detection
by: Zhu, Sa, et al.
Published: (2026)

Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
by: Yuan, Zhenlong, et al.
Published: (2025)

Learning to Generalize without Bias for Open-Vocabulary Action Recognition
by: Yu, Yating, et al.
Published: (2025)

Open-Vocabulary Video Anomaly Detection
by: Wu, Peng, et al.
Published: (2023)

Open-Vocabulary Temporal Action Localization using Multimodal Guidance
by: Gupta, Akshita, et al.
Published: (2024)

Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching
by: Zhang, Hao, et al.
Published: (2023)

Decompose and Transfer: CoT-Prompting Enhanced Alignment for Open-Vocabulary Temporal Action Detection
by: Zhu, Sa, et al.
Published: (2026)

FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition
by: Huang, Xiaohu, et al.
Published: (2024)

RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection
by: Chen, Fangyi, et al.
Published: (2024)

Learning to Detect and Segment for Open Vocabulary Object Detection
by: Wang, Tao, et al.
Published: (2022)

LR0.FM: Low-Res Benchmark and Improving Robustness for Zero-Shot Classification in Foundation Models
by: Pathak, Priyank, et al.
Published: (2025)

OmViD: Omni-supervised active learning for video action detection
by: Rana, Aayush, et al.
Published: (2025)