:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xie, Yuan, Chen, Tianshui, Ge, Zheng, Ni, Lionel
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2508.20478
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
by: Ma, David, et al.
Published: (2025)

LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
by: Fu, Shenghao, et al.
Published: (2025)

Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding
by: Chen, Houlun, et al.
Published: (2026)

VideoPro: Adaptive Program Reasoning for Long Video Understanding
by: Li, Chenglin, et al.
Published: (2025)

Long Video Understanding with Learnable Retrieval in Video-Language Models
by: Xu, Jiaqi, et al.
Published: (2023)

VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation
by: Ma, Wentao, et al.
Published: (2025)

Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding
by: Huang, Yanxiang, et al.
Published: (2026)

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
by: Wang, Yuxuan, et al.
Published: (2024)

Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
by: Chen, Qirui, et al.
Published: (2024)

VideoExplorer: Think With Videos For Agentic Long-Video Understanding
by: Yuan, Huaying, et al.
Published: (2025)

Event-Anchored Frame Selection for Effective Long-Video Understanding
by: Chen, Wang, et al.
Published: (2026)

VideoTIR: Accurate Understanding for Long Videos with Efficient Tool-Integrated Reasoning
by: Gao, Zhe, et al.
Published: (2026)

Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding
by: Hu, Pengfei, et al.
Published: (2025)

LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding
by: Qiu, Jihao, et al.
Published: (2026)

VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
by: Yin, Yufei, et al.
Published: (2025)

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
by: Ren, Weiming, et al.
Published: (2025)

Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning
by: Zhang, Haoji, et al.
Published: (2025)

Reinforcing Video Reasoning with Focused Thinking
by: Dang, Jisheng, et al.
Published: (2025)

SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding
by: Yang, Zhenyu, et al.
Published: (2025)

VideoRouter: Query-Adaptive Dual Routing for Efficient Long-Video Understanding
by: Lin, Kuanwei, et al.
Published: (2026)

Think, Then Verify: A Hypothesis-Verification Multi-Agent Framework for Long Video Understanding
by: Wang, Zheng, et al.
Published: (2026)

MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues
by: Pan, Yaning, et al.
Published: (2025)

Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding
by: Shen, Xiaoqian, et al.
Published: (2025)

Wavelet-based Frame Selection by Detecting Semantic Boundary for Long Video Understanding
by: Chen, Wang, et al.
Published: (2026)

VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning
by: Ding, Yang, et al.
Published: (2025)

MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding
by: Su, Yuhao, et al.
Published: (2025)

Memory-enhanced Retrieval Augmentation for Long Video Understanding
by: Yuan, Huaying, et al.
Published: (2025)

Decoupling Perception from Reasoning for Hallucination-Resistant Video Understanding
by: Pu, Bowei, et al.
Published: (2025)

Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
by: Liu, Xiangrui, et al.
Published: (2025)

FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
by: He, Zefeng, et al.
Published: (2025)

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
by: Fang, Xinyu, et al.
Published: (2024)

Zero-Shot Long-Form Video Understanding through Screenplay
by: Wu, Yongliang, et al.
Published: (2024)

FiLA-Video: Spatio-Temporal Compression for Fine-Grained Long Video Understanding
by: Guo, Yanan, et al.
Published: (2025)

Hallucination Mitigation Prompts Long-term Video Understanding
by: Sun, Yiwei, et al.
Published: (2024)

An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes
by: Qi, Ji, et al.
Published: (2025)

Linear Scaling Video VLMs for Long Video Understanding
by: Eyzaguirre, Cristobal, et al.
Published: (2026)

Video Token Merging for Long-form Video Understanding
by: Lee, Seon-Ho, et al.
Published: (2024)

VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
by: Liao, Ruotong, et al.
Published: (2024)

Progressive Video Condensation with MLLM Agent for Long-form Video Understanding
by: Yin, Yufei, et al.
Published: (2026)

VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
by: Wang, Qi, et al.
Published: (2025)