:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yuan, Chao, Yang, Yang, Yang, Yehui, Cheng, Zach
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2509.09263
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding
by: Zhai, Yingjie, et al.
Published: (2024)

RRNet: Configurable Real-Time Video Enhancement with Arbitrary Local Lighting Variations
by: Yang, Wenlong, et al.
Published: (2026)

VideoMem: Enhancing Ultra-Long Video Understanding via Adaptive Memory Management
by: Jin, Hongbo, et al.
Published: (2025)

SVAgent: Storyline-Guided Long Video Understanding via Cross-Modal Multi-Agent Collaboration
by: Yang, Zhongyu, et al.
Published: (2026)

Helios: Real Real-Time Long Video Generation Model
by: Yuan, Shenghai, et al.
Published: (2026)

Learning Compact Video Representations for Efficient Long-form Video Understanding in Large Multimodal Models
by: Chen, Yuxiao, et al.
Published: (2026)

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
by: Chen, Boyu, et al.
Published: (2025)

FCMBench-Video: Benchmarking Document Video Intelligence
by: Cui, Runze, et al.
Published: (2026)

SeaDATE: Remedy Dual-Attention Transformer with Semantic Alignment via Contrast Learning for Multimodal Object Detection
by: Dong, Shuhan, et al.
Published: (2024)

Progressive Video Condensation with MLLM Agent for Long-form Video Understanding
by: Yin, Yufei, et al.
Published: (2026)

FiLA-Video: Spatio-Temporal Compression for Fine-Grained Long Video Understanding
by: Guo, Yanan, et al.
Published: (2025)

EEA: Exploration-Exploitation Agent for Long Video Understanding
by: Yang, Te, et al.
Published: (2025)

VCA: Video Curious Agent for Long Video Understanding
by: Yang, Zeyuan, et al.
Published: (2024)

Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
by: Liu, Xiangrui, et al.
Published: (2025)

VirtueBench: Evaluating Trustworthiness under Uncertainty in Long Video Understanding
by: Yu, Xueqing, et al.
Published: (2026)

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
by: Ren, Weiming, et al.
Published: (2024)

VideoLucy: Deep Memory Backtracking for Long Video Understanding
by: Zuo, Jialong, et al.
Published: (2025)

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design
by: Schneider, Benjamin, et al.
Published: (2025)

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
by: Wang, Yuxuan, et al.
Published: (2024)

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
by: Ren, Weiming, et al.
Published: (2025)

The Dynamic Prior: Understanding 3D Structures for Casual Dynamic Videos
by: Wu, Zhuoyuan, et al.
Published: (2025)

FOCUS: Efficient Keyframe Selection for Long Video Understanding
by: Zhu, Zirui, et al.
Published: (2025)

An Embeddable Implicit IUVD Representation for Part-based 3D Human Surface Reconstruction
by: Li, Baoxing, et al.
Published: (2024)

Perceive, Verify and Understand Long Video: Multi-Granular Perception and Active Verification via Interactive Agents
by: Li, Jiahua, et al.
Published: (2025)

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding
by: Yang, Ruoliu, et al.
Published: (2026)

FlexSelect: Flexible Token Selection for Efficient Long Video Understanding
by: Zhang, Yunzhu, et al.
Published: (2025)

Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding
by: Xie, Yuan, et al.
Published: (2025)

Token Merging via Spatiotemporal Information Mining for Surgical Video Understanding
by: Jiang, Xixi, et al.
Published: (2025)

LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
by: Fu, Shenghao, et al.
Published: (2025)

VideoTIR: Accurate Understanding for Long Videos with Efficient Tool-Integrated Reasoning
by: Gao, Zhe, et al.
Published: (2026)

Omni-Video: Democratizing Unified Video Understanding and Generation
by: Tan, Zhiyu, et al.
Published: (2025)

ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
by: Ma, David, et al.
Published: (2025)

Long Context Tuning for Video Generation
by: Guo, Yuwei, et al.
Published: (2025)

LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos
by: Lin, Chin-Yang, et al.
Published: (2025)

CoS: Chain-of-Shot Prompting for Long Video Understanding
by: Hu, Jian, et al.
Published: (2025)

Zero-Shot Long-Form Video Understanding through Screenplay
by: Wu, Yongliang, et al.
Published: (2024)

LongDPM: Overlap-Aware 4D Reconstruction from Long Monocular Videos
by: Xu, Chenyi, et al.
Published: (2026)

TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
by: Pan, Junwen, et al.
Published: (2025)

Zero-Shot Video Restoration and Enhancement with Assistance of Video Diffusion Models
by: Cao, Cong, et al.
Published: (2026)

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
by: Yang, Zongxin, et al.
Published: (2024)