:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	De la Jara, Ignacio M., Rodriguez-Opazo, Cristian, Marrese-Taylor, Edison, Bravo-Marquez, Felipe
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2510.17007
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
by: Rodriguez-Opazo, Cristian, et al.
Published: (2024)

Image-Text Relation Prediction for Multilingual Tweets
by: Rikters, Matīss, et al.
Published: (2025)

Mysteries of the Deep: Role of Intermediate Representations in Out of Distribution Detection
by: De la Jara, I. M., et al.
Published: (2025)

Temporally Grounding Instructional Diagrams in Unconstrained Videos
by: Zhang, Jiahao, et al.
Published: (2024)

Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction
by: Liu, Zheyuan, et al.
Published: (2025)

EvoGround: Self-Evolving Video Agents for Video Temporal Grounding
by: Jung, Minjoon, et al.
Published: (2026)

VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding
by: Wang, Shihao, et al.
Published: (2025)

Moment Quantization for Video Temporal Grounding
by: Sun, Xiaolong, et al.
Published: (2025)

Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
by: Wasim, Syed Talal, et al.
Published: (2023)

T2SGrid: Temporal-to-Spatial Gridification for Video Temporal Grounding
by: Guo, Chaohong, et al.
Published: (2026)

Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video
by: Venkataramanan, Shashanka, et al.
Published: (2023)

Foresee-to-Ground: From Predictive Temporal Perception to Evidence-Driven Reasoning for Video Temporal Grounding
by: Zheng, Zelin, et al.
Published: (2026)

Context-Guided Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2024)

ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding
by: Wang, Yubin, et al.
Published: (2024)

TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
by: Yang, Zuhao, et al.
Published: (2025)

Towards Long-Form Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2026)

Multi-Scale Contrastive Learning for Video Temporal Grounding
by: Nguyen, Thong Thanh, et al.
Published: (2024)

Number it: Temporal Grounding Videos like Flipping Manga
by: Wu, Yongliang, et al.
Published: (2024)

VideoMolmo: Spatio-Temporal Grounding Meets Pointing
by: Ahmad, Ghazi Shazan, et al.
Published: (2025)

SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding
by: Drago, Mauro Orazio, et al.
Published: (2025)

Sparse-Dense Side-Tuner for efficient Video Temporal Grounding
by: Pujol-Perich, David, et al.
Published: (2025)

SimBase: A Simple Baseline for Temporal Video Grounding
by: Bao, Peijun, et al.
Published: (2024)

Diversified Augmentation with Domain Adaptation for Debiased Video Temporal Grounding
by: Ren, Junlong, et al.
Published: (2025)

Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding
by: Xiong, Yuanhao, et al.
Published: (2023)

VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning
by: Zhang, Jinglei, et al.
Published: (2025)

Static and Dynamic Graph Alignment Network for Temporal Video Grounding
by: Hu, Zhanjie, et al.
Published: (2026)

Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
by: Moon, WonJun, et al.
Published: (2023)

Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
by: Pramanick, Shraman, et al.
Published: (2025)

GroundVTS: Visual Token Sampling in Multimodal Large Language Models for Video Temporal Grounding
by: Fan, Rong, et al.
Published: (2026)

VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
by: Guo, Yongxin, et al.
Published: (2024)

$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
by: Liu, Ye, et al.
Published: (2024)

EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model
by: Li, Guozhang, et al.
Published: (2023)

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
by: Wang, Haibo, et al.
Published: (2024)

TRACE: Temporal Grounding Video LLM via Causal Event Modeling
by: Guo, Yongxin, et al.
Published: (2024)

A Survey on Video Temporal Grounding with Multimodal Large Language Model
by: Wu, Jianlong, et al.
Published: (2025)

Temporal Grounding as a Learning Signal for Referring Video Object Segmentation
by: Lee, Seunghun, et al.
Published: (2025)

Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding
by: Hu, Jingjing, et al.
Published: (2024)

OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
by: Yao, Jiali, et al.
Published: (2025)

Bridging Time and Space: Decoupled Spatio-Temporal Alignment for Video Grounding
by: Tu, Xuezhen, et al.
Published: (2026)

SlotVTG: Object-Centric Adapter for Generalizable Video Temporal Grounding
by: Han, Jiwook, et al.
Published: (2026)