Saved in:
| Main Authors: | Yuan, Chao, Yang, Yang, Yang, Yehui, Cheng, Zach |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.09263 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding
by: Zhai, Yingjie, et al.
Published: (2024)
by: Zhai, Yingjie, et al.
Published: (2024)
RRNet: Configurable Real-Time Video Enhancement with Arbitrary Local Lighting Variations
by: Yang, Wenlong, et al.
Published: (2026)
by: Yang, Wenlong, et al.
Published: (2026)
VideoMem: Enhancing Ultra-Long Video Understanding via Adaptive Memory Management
by: Jin, Hongbo, et al.
Published: (2025)
by: Jin, Hongbo, et al.
Published: (2025)
SVAgent: Storyline-Guided Long Video Understanding via Cross-Modal Multi-Agent Collaboration
by: Yang, Zhongyu, et al.
Published: (2026)
by: Yang, Zhongyu, et al.
Published: (2026)
Helios: Real Real-Time Long Video Generation Model
by: Yuan, Shenghai, et al.
Published: (2026)
by: Yuan, Shenghai, et al.
Published: (2026)
Learning Compact Video Representations for Efficient Long-form Video Understanding in Large Multimodal Models
by: Chen, Yuxiao, et al.
Published: (2026)
by: Chen, Yuxiao, et al.
Published: (2026)
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
by: Chen, Boyu, et al.
Published: (2025)
by: Chen, Boyu, et al.
Published: (2025)
FCMBench-Video: Benchmarking Document Video Intelligence
by: Cui, Runze, et al.
Published: (2026)
by: Cui, Runze, et al.
Published: (2026)
SeaDATE: Remedy Dual-Attention Transformer with Semantic Alignment via Contrast Learning for Multimodal Object Detection
by: Dong, Shuhan, et al.
Published: (2024)
by: Dong, Shuhan, et al.
Published: (2024)
Progressive Video Condensation with MLLM Agent for Long-form Video Understanding
by: Yin, Yufei, et al.
Published: (2026)
by: Yin, Yufei, et al.
Published: (2026)
FiLA-Video: Spatio-Temporal Compression for Fine-Grained Long Video Understanding
by: Guo, Yanan, et al.
Published: (2025)
by: Guo, Yanan, et al.
Published: (2025)
EEA: Exploration-Exploitation Agent for Long Video Understanding
by: Yang, Te, et al.
Published: (2025)
by: Yang, Te, et al.
Published: (2025)
VCA: Video Curious Agent for Long Video Understanding
by: Yang, Zeyuan, et al.
Published: (2024)
by: Yang, Zeyuan, et al.
Published: (2024)
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
by: Liu, Xiangrui, et al.
Published: (2025)
by: Liu, Xiangrui, et al.
Published: (2025)
VirtueBench: Evaluating Trustworthiness under Uncertainty in Long Video Understanding
by: Yu, Xueqing, et al.
Published: (2026)
by: Yu, Xueqing, et al.
Published: (2026)
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
by: Ren, Weiming, et al.
Published: (2024)
by: Ren, Weiming, et al.
Published: (2024)
VideoLucy: Deep Memory Backtracking for Long Video Understanding
by: Zuo, Jialong, et al.
Published: (2025)
by: Zuo, Jialong, et al.
Published: (2025)
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design
by: Schneider, Benjamin, et al.
Published: (2025)
by: Schneider, Benjamin, et al.
Published: (2025)
VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
by: Wang, Yuxuan, et al.
Published: (2024)
by: Wang, Yuxuan, et al.
Published: (2024)
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
by: Ren, Weiming, et al.
Published: (2025)
by: Ren, Weiming, et al.
Published: (2025)
The Dynamic Prior: Understanding 3D Structures for Casual Dynamic Videos
by: Wu, Zhuoyuan, et al.
Published: (2025)
by: Wu, Zhuoyuan, et al.
Published: (2025)
FOCUS: Efficient Keyframe Selection for Long Video Understanding
by: Zhu, Zirui, et al.
Published: (2025)
by: Zhu, Zirui, et al.
Published: (2025)
An Embeddable Implicit IUVD Representation for Part-based 3D Human Surface Reconstruction
by: Li, Baoxing, et al.
Published: (2024)
by: Li, Baoxing, et al.
Published: (2024)
Perceive, Verify and Understand Long Video: Multi-Granular Perception and Active Verification via Interactive Agents
by: Li, Jiahua, et al.
Published: (2025)
by: Li, Jiahua, et al.
Published: (2025)
VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding
by: Yang, Ruoliu, et al.
Published: (2026)
by: Yang, Ruoliu, et al.
Published: (2026)
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding
by: Zhang, Yunzhu, et al.
Published: (2025)
by: Zhang, Yunzhu, et al.
Published: (2025)
Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding
by: Xie, Yuan, et al.
Published: (2025)
by: Xie, Yuan, et al.
Published: (2025)
Token Merging via Spatiotemporal Information Mining for Surgical Video Understanding
by: Jiang, Xixi, et al.
Published: (2025)
by: Jiang, Xixi, et al.
Published: (2025)
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
by: Fu, Shenghao, et al.
Published: (2025)
by: Fu, Shenghao, et al.
Published: (2025)
VideoTIR: Accurate Understanding for Long Videos with Efficient Tool-Integrated Reasoning
by: Gao, Zhe, et al.
Published: (2026)
by: Gao, Zhe, et al.
Published: (2026)
Omni-Video: Democratizing Unified Video Understanding and Generation
by: Tan, Zhiyu, et al.
Published: (2025)
by: Tan, Zhiyu, et al.
Published: (2025)
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
by: Ma, David, et al.
Published: (2025)
by: Ma, David, et al.
Published: (2025)
Long Context Tuning for Video Generation
by: Guo, Yuwei, et al.
Published: (2025)
by: Guo, Yuwei, et al.
Published: (2025)
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos
by: Lin, Chin-Yang, et al.
Published: (2025)
by: Lin, Chin-Yang, et al.
Published: (2025)
CoS: Chain-of-Shot Prompting for Long Video Understanding
by: Hu, Jian, et al.
Published: (2025)
by: Hu, Jian, et al.
Published: (2025)
Zero-Shot Long-Form Video Understanding through Screenplay
by: Wu, Yongliang, et al.
Published: (2024)
by: Wu, Yongliang, et al.
Published: (2024)
LongDPM: Overlap-Aware 4D Reconstruction from Long Monocular Videos
by: Xu, Chenyi, et al.
Published: (2026)
by: Xu, Chenyi, et al.
Published: (2026)
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
by: Pan, Junwen, et al.
Published: (2025)
by: Pan, Junwen, et al.
Published: (2025)
Zero-Shot Video Restoration and Enhancement with Assistance of Video Diffusion Models
by: Cao, Cong, et al.
Published: (2026)
by: Cao, Cong, et al.
Published: (2026)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
by: Yang, Zongxin, et al.
Published: (2024)
by: Yang, Zongxin, et al.
Published: (2024)
Similar Items
-
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding
by: Zhai, Yingjie, et al.
Published: (2024) -
RRNet: Configurable Real-Time Video Enhancement with Arbitrary Local Lighting Variations
by: Yang, Wenlong, et al.
Published: (2026) -
VideoMem: Enhancing Ultra-Long Video Understanding via Adaptive Memory Management
by: Jin, Hongbo, et al.
Published: (2025) -
SVAgent: Storyline-Guided Long Video Understanding via Cross-Modal Multi-Agent Collaboration
by: Yang, Zhongyu, et al.
Published: (2026) -
Helios: Real Real-Time Long Video Generation Model
by: Yuan, Shenghai, et al.
Published: (2026)