Saved in:
| Main Authors: | Sia, Zhen Hao, Rawat, Yogesh Singh |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.03096 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Stable Mean Teacher for Semi-supervised Video Action Detection
by: Kumar, Akash, et al.
Published: (2024)
by: Kumar, Akash, et al.
Published: (2024)
On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes
by: Modi, Rajat, et al.
Published: (2024)
by: Modi, Rajat, et al.
Published: (2024)
Semi-supervised Active Learning for Video Action Detection
by: Singh, Ayush, et al.
Published: (2023)
by: Singh, Ayush, et al.
Published: (2023)
Activity-Biometrics: Person Identification from Daily Activities
by: Azad, Shehreen, et al.
Published: (2024)
by: Azad, Shehreen, et al.
Published: (2024)
Open-Vocabulary Spatio-Temporal Action Detection
by: Wu, Tao, et al.
Published: (2024)
by: Wu, Tao, et al.
Published: (2024)
EZ-CLIP: Efficient Zeroshot Video Action Recognition
by: Ahmad, Shahzad, et al.
Published: (2023)
by: Ahmad, Shahzad, et al.
Published: (2023)
Asynchronous Perception Machine For Efficient Test-Time-Training
by: Modi, Rajat, et al.
Published: (2024)
by: Modi, Rajat, et al.
Published: (2024)
Scaling Open-Vocabulary Object Detection
by: Minderer, Matthias, et al.
Published: (2023)
by: Minderer, Matthias, et al.
Published: (2023)
MolVision: Molecular Property Prediction with Vision Language Models
by: Adak, Deepan, et al.
Published: (2025)
by: Adak, Deepan, et al.
Published: (2025)
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
by: Azad, Shehreen, et al.
Published: (2025)
by: Azad, Shehreen, et al.
Published: (2025)
iSafetyBench: A video-language benchmark for safety in industrial environment
by: Abdullah, Raiyaan, et al.
Published: (2025)
by: Abdullah, Raiyaan, et al.
Published: (2025)
Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding
by: Kumar, Akash, et al.
Published: (2025)
by: Kumar, Akash, et al.
Published: (2025)
StreamReady: Learning What to Answer and When in Long Streaming Videos
by: Azad, Shehreen, et al.
Published: (2026)
by: Azad, Shehreen, et al.
Published: (2026)
Colors See Colors Ignore: Clothes Changing ReID with Color Disentanglement
by: Pathak, Priyank, et al.
Published: (2025)
by: Pathak, Priyank, et al.
Published: (2025)
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID
by: Liang, Xin, et al.
Published: (2025)
by: Liang, Xin, et al.
Published: (2025)
DisenQ: Disentangling Q-Former for Activity-Biometrics
by: Azad, Shehreen, et al.
Published: (2025)
by: Azad, Shehreen, et al.
Published: (2025)
Coarse Attribute Prediction with Task Agnostic Distillation for Real World Clothes Changing ReID
by: Pathak, Priyank, et al.
Published: (2025)
by: Pathak, Priyank, et al.
Published: (2025)
ProDiG: Progressive Diffusion-Guided Gaussian Splatting for Aerial to Ground Reconstruction
by: Mitra, Sirshapan, et al.
Published: (2026)
by: Mitra, Sirshapan, et al.
Published: (2026)
Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
by: Bao, Wentao, et al.
Published: (2024)
by: Bao, Wentao, et al.
Published: (2024)
A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning
by: Kumar, Akash, et al.
Published: (2025)
by: Kumar, Akash, et al.
Published: (2025)
GaitCrafter: Diffusion Model for Biometric Preserving Gait Synthesis
by: Mitra, Sirshapan, et al.
Published: (2025)
by: Mitra, Sirshapan, et al.
Published: (2025)
STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding
by: Garg, Aaryan, et al.
Published: (2025)
by: Garg, Aaryan, et al.
Published: (2025)
Navigating Hallucinations for Reasoning of Unintentional Activities
by: Grover, Shresth, et al.
Published: (2024)
by: Grover, Shresth, et al.
Published: (2024)
Advancing Automatic Photovoltaic Defect Detection using Semi-Supervised Semantic Segmentation of Electroluminescence Images
by: Jha, Abhishek, et al.
Published: (2024)
by: Jha, Abhishek, et al.
Published: (2024)
Open Vocabulary Monocular 3D Object Detection
by: Yao, Jin, et al.
Published: (2024)
by: Yao, Jin, et al.
Published: (2024)
DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition
by: Cheng, Haozhe, et al.
Published: (2024)
by: Cheng, Haozhe, et al.
Published: (2024)
Towards Open-Vocabulary Industrial Defect Understanding with a Large-Scale Multimodal Dataset
by: Ni, TsaiChing, et al.
Published: (2025)
by: Ni, TsaiChing, et al.
Published: (2025)
One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features
by: Nguyen, Trung Thanh, et al.
Published: (2024)
by: Nguyen, Trung Thanh, et al.
Published: (2024)
Denoise and Align: Diffusion-Driven Foreground Knowledge Prompting for Open-Vocabulary Temporal Action Detection
by: Zhu, Sa, et al.
Published: (2026)
by: Zhu, Sa, et al.
Published: (2026)
Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
by: Yuan, Zhenlong, et al.
Published: (2025)
by: Yuan, Zhenlong, et al.
Published: (2025)
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
by: Yu, Yating, et al.
Published: (2025)
by: Yu, Yating, et al.
Published: (2025)
Open-Vocabulary Video Anomaly Detection
by: Wu, Peng, et al.
Published: (2023)
by: Wu, Peng, et al.
Published: (2023)
Open-Vocabulary Temporal Action Localization using Multimodal Guidance
by: Gupta, Akshita, et al.
Published: (2024)
by: Gupta, Akshita, et al.
Published: (2024)
Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching
by: Zhang, Hao, et al.
Published: (2023)
by: Zhang, Hao, et al.
Published: (2023)
Decompose and Transfer: CoT-Prompting Enhanced Alignment for Open-Vocabulary Temporal Action Detection
by: Zhu, Sa, et al.
Published: (2026)
by: Zhu, Sa, et al.
Published: (2026)
FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition
by: Huang, Xiaohu, et al.
Published: (2024)
by: Huang, Xiaohu, et al.
Published: (2024)
RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection
by: Chen, Fangyi, et al.
Published: (2024)
by: Chen, Fangyi, et al.
Published: (2024)
Learning to Detect and Segment for Open Vocabulary Object Detection
by: Wang, Tao, et al.
Published: (2022)
by: Wang, Tao, et al.
Published: (2022)
LR0.FM: Low-Res Benchmark and Improving Robustness for Zero-Shot Classification in Foundation Models
by: Pathak, Priyank, et al.
Published: (2025)
by: Pathak, Priyank, et al.
Published: (2025)
OmViD: Omni-supervised active learning for video action detection
by: Rana, Aayush, et al.
Published: (2025)
by: Rana, Aayush, et al.
Published: (2025)
Similar Items
-
Stable Mean Teacher for Semi-supervised Video Action Detection
by: Kumar, Akash, et al.
Published: (2024) -
On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes
by: Modi, Rajat, et al.
Published: (2024) -
Semi-supervised Active Learning for Video Action Detection
by: Singh, Ayush, et al.
Published: (2023) -
Activity-Biometrics: Person Identification from Daily Activities
by: Azad, Shehreen, et al.
Published: (2024) -
Open-Vocabulary Spatio-Temporal Action Detection
by: Wu, Tao, et al.
Published: (2024)