:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Huang, Jiantang
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2605.01512
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Zero-Shot Action Recognition in Surveillance Videos
by: Pereira, Joao, et al.
Published: (2024)

Slow - Motion Video Synthesis for Basketball Using Frame Interpolation
by: Huang, Jiantang
Published: (2025)

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
by: Yang, Shuai, et al.
Published: (2024)

A Modular Zero-Shot Pipeline for Accident Detection, Localization, and Classification in Traffic Surveillance Video
by: Thakur, Amey, et al.
Published: (2026)

Zero-Shot Video Translation and Editing with Frame Spatial-Temporal Correspondence
by: Yang, Shuai, et al.
Published: (2025)

Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding
by: Yang, Zaiquan, et al.
Published: (2025)

TAG: A Simple Yet Effective Temporal-Aware Approach for Zero-Shot Video Temporal Grounding
by: Lee, Jin-Seop, et al.
Published: (2025)

VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT
by: Xu, Yifang, et al.
Published: (2024)

A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing
by: Li, Maomao, et al.
Published: (2023)

T2SGrid: Temporal-to-Spatial Gridification for Video Temporal Grounding
by: Guo, Chaohong, et al.
Published: (2026)

Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM
by: Kamboj, Payal, et al.
Published: (2025)

GRAZE: Grounded Refinement and Motion-Aware Zero-Shot Event Localization
by: Zaidi, Syed Ahsan Masud, et al.
Published: (2026)

Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding
by: Xiong, Yuanhao, et al.
Published: (2023)

TRACE: Temporal Grounding Video LLM via Causal Event Modeling
by: Guo, Yongxin, et al.
Published: (2024)

VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing
by: Couairon, Paul, et al.
Published: (2023)

GraphThinker: Reinforcing Temporally Grounded Video Reasoning with Event Graph Thinking
by: Cheng, Zixu, et al.
Published: (2026)

Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding
by: Zheng, Minghang, et al.
Published: (2025)

Zero-Shot Temporal Interaction Localization for Egocentric Videos
by: Zhang, Erhang, et al.
Published: (2025)

EZSR: Event-based Zero-Shot Recognition
by: Yang, Yan, et al.
Published: (2024)

Context-Guided Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2024)

WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
by: Kong, Quan, et al.
Published: (2024)

Scaling Zero-Shot Reference-to-Video Generation
by: Zhou, Zijian, et al.
Published: (2025)

Foresee-to-Ground: From Predictive Temporal Perception to Evidence-Driven Reasoning for Video Temporal Grounding
by: Zheng, Zelin, et al.
Published: (2026)

GroundingAnomaly: Spatially-Grounded Diffusion for Few-Shot Anomaly Synthesis
by: Liu, Yishen, et al.
Published: (2026)

Multi-Stage VLM Pipeline for Zero-Shot Traffic Accident Understanding
by: Tatematsu, Fumiya, et al.
Published: (2026)

VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
by: Liao, Ruotong, et al.
Published: (2024)

iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning
by: Yao, Manyi, et al.
Published: (2025)

Zero-Shot Video Deraining with Video Diffusion Models
by: Varanka, Tuomas, et al.
Published: (2025)

TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes
by: Xia, Yan, et al.
Published: (2024)

EvoGround: Self-Evolving Video Agents for Video Temporal Grounding
by: Jung, Minjoon, et al.
Published: (2026)

Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval
by: Dipta, Shubhashis Roy, et al.
Published: (2025)

Temporal-Visual Semantic Alignment: A Unified Architecture for Transferring Spatial Priors from Vision Models to Zero-Shot Temporal Tasks
by: Ma, Xiangkai, et al.
Published: (2025)

VISTA: Validation-Guided Integration of Spatial and Temporal Foundation Models with Anatomical Decoding for Rare-Pathology VCE Event Detection
by: Qiu, Bo-Cheng, et al.
Published: (2026)

Surveillance Video-Based Traffic Accident Detection Using Transformer Architecture
by: Singh, Tanu, et al.
Published: (2025)

Test-Time Zero-Shot Temporal Action Localization
by: Liberatori, Benedetta, et al.
Published: (2024)

MVP: Motion Vector Propagation for Zero-Shot Video Object Detection
by: Huang, Binhua, et al.
Published: (2025)

Clustering Aided Weakly Supervised Training to Detect Anomalous Events in Surveillance Videos
by: Zaheer, Muhammad Zaigham, et al.
Published: (2022)

Towards Long-Form Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2026)

Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices
by: Cohen, Nathaniel, et al.
Published: (2024)

Zero-TIG: Temporal Consistency-Aware Zero-Shot Illumination-Guided Low-light Video Enhancement
by: Li, Yini, et al.
Published: (2025)