Saved in:
| Main Authors: | De la Jara, Ignacio M., Rodriguez-Opazo, Cristian, Marrese-Taylor, Edison, Bravo-Marquez, Felipe |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.17007 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
by: Rodriguez-Opazo, Cristian, et al.
Published: (2024)
by: Rodriguez-Opazo, Cristian, et al.
Published: (2024)
Image-Text Relation Prediction for Multilingual Tweets
by: Rikters, Matīss, et al.
Published: (2025)
by: Rikters, Matīss, et al.
Published: (2025)
Mysteries of the Deep: Role of Intermediate Representations in Out of Distribution Detection
by: De la Jara, I. M., et al.
Published: (2025)
by: De la Jara, I. M., et al.
Published: (2025)
Temporally Grounding Instructional Diagrams in Unconstrained Videos
by: Zhang, Jiahao, et al.
Published: (2024)
by: Zhang, Jiahao, et al.
Published: (2024)
Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction
by: Liu, Zheyuan, et al.
Published: (2025)
by: Liu, Zheyuan, et al.
Published: (2025)
EvoGround: Self-Evolving Video Agents for Video Temporal Grounding
by: Jung, Minjoon, et al.
Published: (2026)
by: Jung, Minjoon, et al.
Published: (2026)
VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding
by: Wang, Shihao, et al.
Published: (2025)
by: Wang, Shihao, et al.
Published: (2025)
Moment Quantization for Video Temporal Grounding
by: Sun, Xiaolong, et al.
Published: (2025)
by: Sun, Xiaolong, et al.
Published: (2025)
Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
by: Wasim, Syed Talal, et al.
Published: (2023)
by: Wasim, Syed Talal, et al.
Published: (2023)
T2SGrid: Temporal-to-Spatial Gridification for Video Temporal Grounding
by: Guo, Chaohong, et al.
Published: (2026)
by: Guo, Chaohong, et al.
Published: (2026)
Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video
by: Venkataramanan, Shashanka, et al.
Published: (2023)
by: Venkataramanan, Shashanka, et al.
Published: (2023)
Foresee-to-Ground: From Predictive Temporal Perception to Evidence-Driven Reasoning for Video Temporal Grounding
by: Zheng, Zelin, et al.
Published: (2026)
by: Zheng, Zelin, et al.
Published: (2026)
Context-Guided Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2024)
by: Gu, Xin, et al.
Published: (2024)
ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding
by: Wang, Yubin, et al.
Published: (2024)
by: Wang, Yubin, et al.
Published: (2024)
TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
by: Yang, Zuhao, et al.
Published: (2025)
by: Yang, Zuhao, et al.
Published: (2025)
Towards Long-Form Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2026)
by: Gu, Xin, et al.
Published: (2026)
Multi-Scale Contrastive Learning for Video Temporal Grounding
by: Nguyen, Thong Thanh, et al.
Published: (2024)
by: Nguyen, Thong Thanh, et al.
Published: (2024)
Number it: Temporal Grounding Videos like Flipping Manga
by: Wu, Yongliang, et al.
Published: (2024)
by: Wu, Yongliang, et al.
Published: (2024)
VideoMolmo: Spatio-Temporal Grounding Meets Pointing
by: Ahmad, Ghazi Shazan, et al.
Published: (2025)
by: Ahmad, Ghazi Shazan, et al.
Published: (2025)
SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding
by: Drago, Mauro Orazio, et al.
Published: (2025)
by: Drago, Mauro Orazio, et al.
Published: (2025)
Sparse-Dense Side-Tuner for efficient Video Temporal Grounding
by: Pujol-Perich, David, et al.
Published: (2025)
by: Pujol-Perich, David, et al.
Published: (2025)
SimBase: A Simple Baseline for Temporal Video Grounding
by: Bao, Peijun, et al.
Published: (2024)
by: Bao, Peijun, et al.
Published: (2024)
Diversified Augmentation with Domain Adaptation for Debiased Video Temporal Grounding
by: Ren, Junlong, et al.
Published: (2025)
by: Ren, Junlong, et al.
Published: (2025)
Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding
by: Xiong, Yuanhao, et al.
Published: (2023)
by: Xiong, Yuanhao, et al.
Published: (2023)
VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning
by: Zhang, Jinglei, et al.
Published: (2025)
by: Zhang, Jinglei, et al.
Published: (2025)
Static and Dynamic Graph Alignment Network for Temporal Video Grounding
by: Hu, Zhanjie, et al.
Published: (2026)
by: Hu, Zhanjie, et al.
Published: (2026)
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
by: Moon, WonJun, et al.
Published: (2023)
by: Moon, WonJun, et al.
Published: (2023)
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
by: Pramanick, Shraman, et al.
Published: (2025)
by: Pramanick, Shraman, et al.
Published: (2025)
GroundVTS: Visual Token Sampling in Multimodal Large Language Models for Video Temporal Grounding
by: Fan, Rong, et al.
Published: (2026)
by: Fan, Rong, et al.
Published: (2026)
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
by: Guo, Yongxin, et al.
Published: (2024)
by: Guo, Yongxin, et al.
Published: (2024)
$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
by: Liu, Ye, et al.
Published: (2024)
by: Liu, Ye, et al.
Published: (2024)
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model
by: Li, Guozhang, et al.
Published: (2023)
by: Li, Guozhang, et al.
Published: (2023)
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
by: Wang, Haibo, et al.
Published: (2024)
by: Wang, Haibo, et al.
Published: (2024)
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
by: Guo, Yongxin, et al.
Published: (2024)
by: Guo, Yongxin, et al.
Published: (2024)
A Survey on Video Temporal Grounding with Multimodal Large Language Model
by: Wu, Jianlong, et al.
Published: (2025)
by: Wu, Jianlong, et al.
Published: (2025)
Temporal Grounding as a Learning Signal for Referring Video Object Segmentation
by: Lee, Seunghun, et al.
Published: (2025)
by: Lee, Seunghun, et al.
Published: (2025)
Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding
by: Hu, Jingjing, et al.
Published: (2024)
by: Hu, Jingjing, et al.
Published: (2024)
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
by: Yao, Jiali, et al.
Published: (2025)
by: Yao, Jiali, et al.
Published: (2025)
Bridging Time and Space: Decoupled Spatio-Temporal Alignment for Video Grounding
by: Tu, Xuezhen, et al.
Published: (2026)
by: Tu, Xuezhen, et al.
Published: (2026)
SlotVTG: Object-Centric Adapter for Generalizable Video Temporal Grounding
by: Han, Jiwook, et al.
Published: (2026)
by: Han, Jiwook, et al.
Published: (2026)
Similar Items
-
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
by: Rodriguez-Opazo, Cristian, et al.
Published: (2024) -
Image-Text Relation Prediction for Multilingual Tweets
by: Rikters, Matīss, et al.
Published: (2025) -
Mysteries of the Deep: Role of Intermediate Representations in Out of Distribution Detection
by: De la Jara, I. M., et al.
Published: (2025) -
Temporally Grounding Instructional Diagrams in Unconstrained Videos
by: Zhang, Jiahao, et al.
Published: (2024) -
Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction
by: Liu, Zheyuan, et al.
Published: (2025)