:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Ziyi, Wu, Haoran, Rong, Yiming, Jiang, Deyang, Zhang, Yixin, Zhao, Yunlong, Xu, Shuang, XU, Bo
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.06835
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Speech-Aware Long Context Pruning and Integration for Contextualized Automatic Speech Recognition
by: Rong, Yiming, et al.
Published: (2025)

Linear Scaling Video VLMs for Long Video Understanding
by: Eyzaguirre, Cristobal, et al.
Published: (2026)

LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression
by: Jiang, Wei, et al.
Published: (2024)

Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
by: Liu, Xiangrui, et al.
Published: (2025)

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
by: Shen, Xiaoqian, et al.
Published: (2024)

Video-QTR: Query-Driven Temporal Reasoning Framework for Lightweight Video Understanding
by: Zhao, Xinkui, et al.
Published: (2025)

NeuralLVC: Neural Lossless Video Compression via Masked Diffusion with Temporal Conditioning
by: Uricchio, Tiberio, et al.
Published: (2026)

Uni-LVC: A Unified Method for Intra- and Inter-Mode Learned Video Compression
by: Zhang, Yichi, et al.
Published: (2026)

Task-Aware KV Compression For Cost-Effective Long Video Understanding
by: Qin, Minghao, et al.
Published: (2025)

Enhancing Long Video Understanding via Hierarchical Event-Based Memory
by: Cheng, Dingxin, et al.
Published: (2024)

LightZeroNav: Zero-Shot Vision Language Navigation in Continuous Environments Based on Lightweight VLMs
by: Luo, Kun, et al.
Published: (2026)

FiLA-Video: Spatio-Temporal Compression for Fine-Grained Long Video Understanding
by: Guo, Yanan, et al.
Published: (2025)

Stateful Token Reduction for Long-Video Hybrid VLMs
by: Jiang, Jindong, et al.
Published: (2026)

Think, Then Verify: A Hypothesis-Verification Multi-Agent Framework for Long Video Understanding
by: Wang, Zheng, et al.
Published: (2026)

Long Story Short: Disentangling Compositionality and Long-Caption Understanding in Contrastive VLMs
by: Salazar, Israfel, et al.
Published: (2025)

EEA: Exploration-Exploitation Agent for Long Video Understanding
by: Yang, Te, et al.
Published: (2025)

LoViC: Efficient Long Video Generation with Context Compression
by: Jiang, Jiaxiu, et al.
Published: (2025)

VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation
by: Ma, Wentao, et al.
Published: (2025)

OmniVid: A Generative Framework for Universal Video Understanding
by: Wang, Junke, et al.
Published: (2024)

A Unified Framework for Human-centric Point Cloud Video Understanding
by: Xu, Yiteng, et al.
Published: (2024)

FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding
by: Xie, Yiweng, et al.
Published: (2026)

Towards Lossless Ultimate Vision Token Compression for VLMs
by: Zheng, Dehua, et al.
Published: (2025)

Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising
by: Yuan, Yunlong, et al.
Published: (2025)

Towards Event-oriented Long Video Understanding
by: Du, Yifan, et al.
Published: (2024)

UniMotion: A Unified Framework for Motion-Text-Vision Understanding and Generation
by: Wang, Ziyi, et al.
Published: (2026)

StreamMeCo: Long-Term Agent Memory Compression for Efficient Streaming Video Understanding
by: Wang, Junxi, et al.
Published: (2026)

Question-guided Visual Compression with Memory Feedback for Long-Term Video Understanding
by: Yamao, Sosuke, et al.
Published: (2026)

CacheFlow: Compressive Streaming Memory for Efficient Long-Form Video Understanding
by: Patel, Shrenik, et al.
Published: (2025)

Controllable Generative Video Compression
by: Ding, Ding, et al.
Published: (2026)

MLVU: Benchmarking Multi-task Long Video Understanding
by: Zhou, Junjie, et al.
Published: (2024)

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
by: Shu, Yan, et al.
Published: (2024)

Benchmarking and Enhancing VLM for Compressed Image Understanding
by: Zhang, Zifu, et al.
Published: (2025)

VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models
by: Lan, Xiaohan, et al.
Published: (2024)

VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning
by: Zhang, Xuanyu, et al.
Published: (2025)

VideoASMR-Bench: Can AI-Generated ASMR Videos Fool VLMs and Humans?
by: Wang, Jiaqi, et al.
Published: (2025)

TAR-TVG: Enhancing VLMs with Timestamp Anchor-Constrained Reasoning for Temporal Video Grounding
by: Guo, Chaohong, et al.
Published: (2025)

METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding
by: Wang, Mengyue, et al.
Published: (2025)

Generative Frame Sampler for Long Video Understanding
by: Yao, Linli, et al.
Published: (2025)

Extreme Video Compression with Pre-trained Diffusion Models
by: Li, Bohan, et al.
Published: (2024)

SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context
by: Li, Jungang, et al.
Published: (2024)