Saved in:
| Main Authors: | Gurukar, Saket, Kadav, Asim |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.13707 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CacheFlow: Compressive Streaming Memory for Efficient Long-Form Video Understanding
by: Patel, Shrenik, et al.
Published: (2025)
by: Patel, Shrenik, et al.
Published: (2025)
VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
by: Yin, Yufei, et al.
Published: (2025)
by: Yin, Yufei, et al.
Published: (2025)
Video-EM: Event-Centric Episodic Memory for Long-Form Video Understanding
by: Wang, Yun, et al.
Published: (2025)
by: Wang, Yun, et al.
Published: (2025)
Text-Conditioned Resampler For Long Form Video Understanding
by: Korbar, Bruno, et al.
Published: (2023)
by: Korbar, Bruno, et al.
Published: (2023)
VideoLucy: Deep Memory Backtracking for Long Video Understanding
by: Zuo, Jialong, et al.
Published: (2025)
by: Zuo, Jialong, et al.
Published: (2025)
LongViTU: Instruction Tuning for Long-Form Video Understanding
by: Wu, Rujie, et al.
Published: (2025)
by: Wu, Rujie, et al.
Published: (2025)
T*: Re-thinking Temporal Search for Long-Form Video Understanding
by: Ye, Jinhui, et al.
Published: (2025)
by: Ye, Jinhui, et al.
Published: (2025)
Zero-Shot Long-Form Video Understanding through Screenplay
by: Wu, Yongliang, et al.
Published: (2024)
by: Wu, Yongliang, et al.
Published: (2024)
VideoMem: Enhancing Ultra-Long Video Understanding via Adaptive Memory Management
by: Jin, Hongbo, et al.
Published: (2025)
by: Jin, Hongbo, et al.
Published: (2025)
Memory Consolidation Enables Long-Context Video Understanding
by: Balažević, Ivana, et al.
Published: (2024)
by: Balažević, Ivana, et al.
Published: (2024)
Memory-enhanced Retrieval Augmentation for Long Video Understanding
by: Yuan, Huaying, et al.
Published: (2025)
by: Yuan, Huaying, et al.
Published: (2025)
GCAgent: Long-Video Understanding via Schematic and Narrative Episodic Memory
by: Yeo, Jeong Hun, et al.
Published: (2025)
by: Yeo, Jeong Hun, et al.
Published: (2025)
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
by: Cheng, Dingxin, et al.
Published: (2024)
by: Cheng, Dingxin, et al.
Published: (2024)
Prompt2LVideos: Exploring Prompts for Understanding Long-Form Multimodal Videos
by: Jahagirdar, Soumya Shamarao, et al.
Published: (2025)
by: Jahagirdar, Soumya Shamarao, et al.
Published: (2025)
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
by: Fang, Xinyu, et al.
Published: (2024)
by: Fang, Xinyu, et al.
Published: (2024)
VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
by: Wang, Yuxuan, et al.
Published: (2024)
by: Wang, Yuxuan, et al.
Published: (2024)
LongVLM: Efficient Long Video Understanding via Large Language Models
by: Weng, Yuetian, et al.
Published: (2024)
by: Weng, Yuetian, et al.
Published: (2024)
MSJoE: Jointly Evolving MLLM and Sampler for Efficient Long-Form Video Understanding
by: Tan, Wenhui, et al.
Published: (2026)
by: Tan, Wenhui, et al.
Published: (2026)
Video Active Perception: Effective Inference-Time Long-Form Video Understanding with Vision-Language Models
by: Ma, Martin Q., et al.
Published: (2026)
by: Ma, Martin Q., et al.
Published: (2026)
Question-guided Visual Compression with Memory Feedback for Long-Term Video Understanding
by: Yamao, Sosuke, et al.
Published: (2026)
by: Yamao, Sosuke, et al.
Published: (2026)
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
by: Song, Enxin, et al.
Published: (2023)
by: Song, Enxin, et al.
Published: (2023)
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
by: Zhang, Haoji, et al.
Published: (2024)
by: Zhang, Haoji, et al.
Published: (2024)
ReWind: Understanding Long Videos with Instructed Learnable Memory
by: Diko, Anxhelo, et al.
Published: (2024)
by: Diko, Anxhelo, et al.
Published: (2024)
Hierarchical Memory for Long Video QA
by: Wang, Yiqin, et al.
Published: (2024)
by: Wang, Yiqin, et al.
Published: (2024)
NeuS-QA: Grounding Long-Form Video Understanding in Temporal Logic and Neuro-Symbolic Reasoning
by: Shah, Sahil, et al.
Published: (2025)
by: Shah, Sahil, et al.
Published: (2025)
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding
by: Li, Jiaze, et al.
Published: (2025)
by: Li, Jiaze, et al.
Published: (2025)
Temporal Preference Optimization for Long-Form Video Understanding
by: Li, Rui, et al.
Published: (2025)
by: Li, Rui, et al.
Published: (2025)
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
by: He, Bo, et al.
Published: (2024)
by: He, Bo, et al.
Published: (2024)
Towards Long-Form Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2026)
by: Gu, Xin, et al.
Published: (2026)
Narrative Aligned Long Form Video Question Answering
by: Jain, Rahul, et al.
Published: (2026)
by: Jain, Rahul, et al.
Published: (2026)
Language Repository for Long Video Understanding
by: Kahatapitiya, Kumara, et al.
Published: (2024)
by: Kahatapitiya, Kumara, et al.
Published: (2024)
VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT
by: Zhi, Zhuo, et al.
Published: (2025)
by: Zhi, Zhuo, et al.
Published: (2025)
LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding
by: Qiu, Jihao, et al.
Published: (2026)
by: Qiu, Jihao, et al.
Published: (2026)
Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
by: Chen, Qirui, et al.
Published: (2024)
by: Chen, Qirui, et al.
Published: (2024)
From Frames to Clips: Training-free Adaptive Key Clip Selection for Long-Form Video Understanding
by: Sun, Guangyu, et al.
Published: (2025)
by: Sun, Guangyu, et al.
Published: (2025)
Towards Long Video Understanding via Fine-detailed Video Story Generation
by: You, Zeng, et al.
Published: (2024)
by: You, Zeng, et al.
Published: (2024)
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
by: Shen, Xiaoqian, et al.
Published: (2024)
by: Shen, Xiaoqian, et al.
Published: (2024)
Scaling the Long Video Understanding of Multimodal Large Language Models via Visual Memory Mechanism
by: Chen, Tao, et al.
Published: (2026)
by: Chen, Tao, et al.
Published: (2026)
Linear Scaling Video VLMs for Long Video Understanding
by: Eyzaguirre, Cristobal, et al.
Published: (2026)
by: Eyzaguirre, Cristobal, et al.
Published: (2026)
Video Token Merging for Long-form Video Understanding
by: Lee, Seon-Ho, et al.
Published: (2024)
by: Lee, Seon-Ho, et al.
Published: (2024)
Similar Items
-
CacheFlow: Compressive Streaming Memory for Efficient Long-Form Video Understanding
by: Patel, Shrenik, et al.
Published: (2025) -
VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
by: Yin, Yufei, et al.
Published: (2025) -
Video-EM: Event-Centric Episodic Memory for Long-Form Video Understanding
by: Wang, Yun, et al.
Published: (2025) -
Text-Conditioned Resampler For Long Form Video Understanding
by: Korbar, Bruno, et al.
Published: (2023) -
VideoLucy: Deep Memory Backtracking for Long Video Understanding
by: Zuo, Jialong, et al.
Published: (2025)