:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gurukar, Saket, Kadav, Asim
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.13707
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CacheFlow: Compressive Streaming Memory for Efficient Long-Form Video Understanding
by: Patel, Shrenik, et al.
Published: (2025)

VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
by: Yin, Yufei, et al.
Published: (2025)

Video-EM: Event-Centric Episodic Memory for Long-Form Video Understanding
by: Wang, Yun, et al.
Published: (2025)

Text-Conditioned Resampler For Long Form Video Understanding
by: Korbar, Bruno, et al.
Published: (2023)

VideoLucy: Deep Memory Backtracking for Long Video Understanding
by: Zuo, Jialong, et al.
Published: (2025)

LongViTU: Instruction Tuning for Long-Form Video Understanding
by: Wu, Rujie, et al.
Published: (2025)

T*: Re-thinking Temporal Search for Long-Form Video Understanding
by: Ye, Jinhui, et al.
Published: (2025)

Zero-Shot Long-Form Video Understanding through Screenplay
by: Wu, Yongliang, et al.
Published: (2024)

VideoMem: Enhancing Ultra-Long Video Understanding via Adaptive Memory Management
by: Jin, Hongbo, et al.
Published: (2025)

Memory Consolidation Enables Long-Context Video Understanding
by: Balažević, Ivana, et al.
Published: (2024)

Memory-enhanced Retrieval Augmentation for Long Video Understanding
by: Yuan, Huaying, et al.
Published: (2025)

GCAgent: Long-Video Understanding via Schematic and Narrative Episodic Memory
by: Yeo, Jeong Hun, et al.
Published: (2025)

Enhancing Long Video Understanding via Hierarchical Event-Based Memory
by: Cheng, Dingxin, et al.
Published: (2024)

Prompt2LVideos: Exploring Prompts for Understanding Long-Form Multimodal Videos
by: Jahagirdar, Soumya Shamarao, et al.
Published: (2025)

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
by: Fang, Xinyu, et al.
Published: (2024)

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
by: Wang, Yuxuan, et al.
Published: (2024)

LongVLM: Efficient Long Video Understanding via Large Language Models
by: Weng, Yuetian, et al.
Published: (2024)

MSJoE: Jointly Evolving MLLM and Sampler for Efficient Long-Form Video Understanding
by: Tan, Wenhui, et al.
Published: (2026)

Video Active Perception: Effective Inference-Time Long-Form Video Understanding with Vision-Language Models
by: Ma, Martin Q., et al.
Published: (2026)

Question-guided Visual Compression with Memory Feedback for Long-Term Video Understanding
by: Yamao, Sosuke, et al.
Published: (2026)

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
by: Song, Enxin, et al.
Published: (2023)

Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
by: Zhang, Haoji, et al.
Published: (2024)

ReWind: Understanding Long Videos with Instructed Learnable Memory
by: Diko, Anxhelo, et al.
Published: (2024)

Hierarchical Memory for Long Video QA
by: Wang, Yiqin, et al.
Published: (2024)

NeuS-QA: Grounding Long-Form Video Understanding in Temporal Logic and Neuro-Symbolic Reasoning
by: Shah, Sahil, et al.
Published: (2025)

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding
by: Li, Jiaze, et al.
Published: (2025)

Temporal Preference Optimization for Long-Form Video Understanding
by: Li, Rui, et al.
Published: (2025)

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
by: He, Bo, et al.
Published: (2024)

Towards Long-Form Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2026)

Narrative Aligned Long Form Video Question Answering
by: Jain, Rahul, et al.
Published: (2026)

Language Repository for Long Video Understanding
by: Kahatapitiya, Kumara, et al.
Published: (2024)

VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT
by: Zhi, Zhuo, et al.
Published: (2025)

LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding
by: Qiu, Jihao, et al.
Published: (2026)

Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
by: Chen, Qirui, et al.
Published: (2024)

From Frames to Clips: Training-free Adaptive Key Clip Selection for Long-Form Video Understanding
by: Sun, Guangyu, et al.
Published: (2025)

Towards Long Video Understanding via Fine-detailed Video Story Generation
by: You, Zeng, et al.
Published: (2024)

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
by: Shen, Xiaoqian, et al.
Published: (2024)

Scaling the Long Video Understanding of Multimodal Large Language Models via Visual Memory Mechanism
by: Chen, Tao, et al.
Published: (2026)

Linear Scaling Video VLMs for Long Video Understanding
by: Eyzaguirre, Cristobal, et al.
Published: (2026)

Video Token Merging for Long-form Video Understanding
by: Lee, Seon-Ho, et al.
Published: (2024)