Saved in:
| Main Authors: | Xie, Yuan, Chen, Tianshui, Ge, Zheng, Ni, Lionel |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.20478 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
by: Ma, David, et al.
Published: (2025)
by: Ma, David, et al.
Published: (2025)
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
by: Fu, Shenghao, et al.
Published: (2025)
by: Fu, Shenghao, et al.
Published: (2025)
Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding
by: Chen, Houlun, et al.
Published: (2026)
by: Chen, Houlun, et al.
Published: (2026)
VideoPro: Adaptive Program Reasoning for Long Video Understanding
by: Li, Chenglin, et al.
Published: (2025)
by: Li, Chenglin, et al.
Published: (2025)
Long Video Understanding with Learnable Retrieval in Video-Language Models
by: Xu, Jiaqi, et al.
Published: (2023)
by: Xu, Jiaqi, et al.
Published: (2023)
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation
by: Ma, Wentao, et al.
Published: (2025)
by: Ma, Wentao, et al.
Published: (2025)
Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding
by: Huang, Yanxiang, et al.
Published: (2026)
by: Huang, Yanxiang, et al.
Published: (2026)
VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
by: Wang, Yuxuan, et al.
Published: (2024)
by: Wang, Yuxuan, et al.
Published: (2024)
Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
by: Chen, Qirui, et al.
Published: (2024)
by: Chen, Qirui, et al.
Published: (2024)
VideoExplorer: Think With Videos For Agentic Long-Video Understanding
by: Yuan, Huaying, et al.
Published: (2025)
by: Yuan, Huaying, et al.
Published: (2025)
Event-Anchored Frame Selection for Effective Long-Video Understanding
by: Chen, Wang, et al.
Published: (2026)
by: Chen, Wang, et al.
Published: (2026)
VideoTIR: Accurate Understanding for Long Videos with Efficient Tool-Integrated Reasoning
by: Gao, Zhe, et al.
Published: (2026)
by: Gao, Zhe, et al.
Published: (2026)
Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding
by: Hu, Pengfei, et al.
Published: (2025)
by: Hu, Pengfei, et al.
Published: (2025)
LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding
by: Qiu, Jihao, et al.
Published: (2026)
by: Qiu, Jihao, et al.
Published: (2026)
VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
by: Yin, Yufei, et al.
Published: (2025)
by: Yin, Yufei, et al.
Published: (2025)
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
by: Ren, Weiming, et al.
Published: (2025)
by: Ren, Weiming, et al.
Published: (2025)
Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning
by: Zhang, Haoji, et al.
Published: (2025)
by: Zhang, Haoji, et al.
Published: (2025)
Reinforcing Video Reasoning with Focused Thinking
by: Dang, Jisheng, et al.
Published: (2025)
by: Dang, Jisheng, et al.
Published: (2025)
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding
by: Yang, Zhenyu, et al.
Published: (2025)
by: Yang, Zhenyu, et al.
Published: (2025)
VideoRouter: Query-Adaptive Dual Routing for Efficient Long-Video Understanding
by: Lin, Kuanwei, et al.
Published: (2026)
by: Lin, Kuanwei, et al.
Published: (2026)
Think, Then Verify: A Hypothesis-Verification Multi-Agent Framework for Long Video Understanding
by: Wang, Zheng, et al.
Published: (2026)
by: Wang, Zheng, et al.
Published: (2026)
MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues
by: Pan, Yaning, et al.
Published: (2025)
by: Pan, Yaning, et al.
Published: (2025)
Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding
by: Shen, Xiaoqian, et al.
Published: (2025)
by: Shen, Xiaoqian, et al.
Published: (2025)
Wavelet-based Frame Selection by Detecting Semantic Boundary for Long Video Understanding
by: Chen, Wang, et al.
Published: (2026)
by: Chen, Wang, et al.
Published: (2026)
VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning
by: Ding, Yang, et al.
Published: (2025)
by: Ding, Yang, et al.
Published: (2025)
MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding
by: Su, Yuhao, et al.
Published: (2025)
by: Su, Yuhao, et al.
Published: (2025)
Memory-enhanced Retrieval Augmentation for Long Video Understanding
by: Yuan, Huaying, et al.
Published: (2025)
by: Yuan, Huaying, et al.
Published: (2025)
Decoupling Perception from Reasoning for Hallucination-Resistant Video Understanding
by: Pu, Bowei, et al.
Published: (2025)
by: Pu, Bowei, et al.
Published: (2025)
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
by: Liu, Xiangrui, et al.
Published: (2025)
by: Liu, Xiangrui, et al.
Published: (2025)
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
by: He, Zefeng, et al.
Published: (2025)
by: He, Zefeng, et al.
Published: (2025)
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
by: Fang, Xinyu, et al.
Published: (2024)
by: Fang, Xinyu, et al.
Published: (2024)
Zero-Shot Long-Form Video Understanding through Screenplay
by: Wu, Yongliang, et al.
Published: (2024)
by: Wu, Yongliang, et al.
Published: (2024)
FiLA-Video: Spatio-Temporal Compression for Fine-Grained Long Video Understanding
by: Guo, Yanan, et al.
Published: (2025)
by: Guo, Yanan, et al.
Published: (2025)
Hallucination Mitigation Prompts Long-term Video Understanding
by: Sun, Yiwei, et al.
Published: (2024)
by: Sun, Yiwei, et al.
Published: (2024)
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes
by: Qi, Ji, et al.
Published: (2025)
by: Qi, Ji, et al.
Published: (2025)
Linear Scaling Video VLMs for Long Video Understanding
by: Eyzaguirre, Cristobal, et al.
Published: (2026)
by: Eyzaguirre, Cristobal, et al.
Published: (2026)
Video Token Merging for Long-form Video Understanding
by: Lee, Seon-Ho, et al.
Published: (2024)
by: Lee, Seon-Ho, et al.
Published: (2024)
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
by: Liao, Ruotong, et al.
Published: (2024)
by: Liao, Ruotong, et al.
Published: (2024)
Progressive Video Condensation with MLLM Agent for Long-form Video Understanding
by: Yin, Yufei, et al.
Published: (2026)
by: Yin, Yufei, et al.
Published: (2026)
VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
by: Wang, Qi, et al.
Published: (2025)
by: Wang, Qi, et al.
Published: (2025)
Similar Items
-
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
by: Ma, David, et al.
Published: (2025) -
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
by: Fu, Shenghao, et al.
Published: (2025) -
Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding
by: Chen, Houlun, et al.
Published: (2026) -
VideoPro: Adaptive Program Reasoning for Long Video Understanding
by: Li, Chenglin, et al.
Published: (2025) -
Long Video Understanding with Learnable Retrieval in Video-Language Models
by: Xu, Jiaqi, et al.
Published: (2023)