:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Jinming, Huang, Jianguo, Jia, Zhaoyang, Li, Jiahao, Zhang, Xiaoyi, Guo, Zongyu, Li, Bin, Zeng, Wenjun, Lu, Yan, Jin, Xin
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2605.17921
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
by: Zhang, Xiaoyi, et al.
Published: (2025)

Generative Latent Video Compression
by: Guo, Zongyu, et al.
Published: (2025)

When MLLMs Meet Compression Distortion: A Coding Paradigm Tailored to MLLMs
by: Liu, Jinming, et al.
Published: (2025)

Generative Video Compression with One-Dimensional Latent Representation
by: Zheng, Zihan, et al.
Published: (2026)

CoD: A Diffusion Foundation Model for Image Compression
by: Jia, Zhaoyang, et al.
Published: (2025)

CoD-Lite: Real-Time Diffusion-Based Generative Image Compression
by: Jia, Zhaoyang, et al.
Published: (2026)

Generation Navigator: A State-Aware Agentic Framework for Image Generation
by: Liu, Jinming, et al.
Published: (2026)

Efficient Autoregressive Video Diffusion with Dummy Head
by: Guo, Hang, et al.
Published: (2026)

InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training
by: Zhang, Ziyun, et al.
Published: (2026)

Generative Latent Coding for Ultra-Low Bitrate Image and Video Compression
by: Qi, Linfeng, et al.
Published: (2025)

Towards Practical Real-Time Neural Video Compression
by: Jia, Zhaoyang, et al.
Published: (2025)

Single-step Diffusion-based Video Coding with Semantic-Temporal Guidance
by: Xue, Naifu, et al.
Published: (2025)

Generative Latent Coding for Ultra-Low Bitrate Image Compression
by: Jia, Zhaoyang, et al.
Published: (2025)

Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
by: Li, Jialuo, et al.
Published: (2025)

Neural Video Compression with Feature Modulation
by: Li, Jiahao, et al.
Published: (2024)

One-Step Diffusion-Based Image Compression with Semantic Distillation
by: Xue, Naifu, et al.
Published: (2025)

DLF: Extreme Image Compression with Dual-generative Latent Fusion
by: Xue, Naifu, et al.
Published: (2025)

A Skill-augmented Agentic Framework and Benchmark for Multi-Video Understanding
by: Zhang, Yue, et al.
Published: (2026)

Compression Tells Intelligence: Visual Coding, Visual Token Technology, and the Unification
by: Jin, Xin, et al.
Published: (2026)

StreamForest: Efficient Online Video Understanding with Persistent Event Memory
by: Zeng, Xiangyu, et al.
Published: (2025)

Uncertainty-Aware Deep Video Compression with Ensembles
by: Ma, Wufei, et al.
Published: (2024)

Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding
by: Huang, Yanxiang, et al.
Published: (2026)

Agentic Video Intelligence: A Flexible Framework for Advanced Video Exploration and Understanding
by: Gao, Hong, et al.
Published: (2025)

Rate-Distortion-Cognition Controllable Versatile Neural Image Compression
by: Liu, Jinming, et al.
Published: (2024)

LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval
by: Ning, Zhenyu, et al.
Published: (2025)

Streaming Long Video Understanding with Large Language Models
by: Qian, Rui, et al.
Published: (2024)

One at a Time: Progressive Multi-step Volumetric Probability Learning for Reliable 3D Scene Perception
by: Li, Bohan, et al.
Published: (2023)

Agentic Very Long Video Understanding
by: Rege, Aniket, et al.
Published: (2026)

MAL: Cluster-Masked and Multi-Task Pretraining for Enhanced xLSTM Vision Performance
by: Huang, Wenjun, et al.
Published: (2024)

Hierarchical Long Video Understanding with Audiovisual Entity Cohesion and Agentic Search
by: Yin, Xinlei, et al.
Published: (2026)

COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework
by: Dong, Xin, et al.
Published: (2024)

Closed-Loop Unsupervised Representation Disentanglement with $β$-VAE Distillation and Diffusion Probabilistic Feedback
by: Jin, Xin, et al.
Published: (2024)

StreamMeCo: Long-Term Agent Memory Compression for Efficient Streaming Video Understanding
by: Wang, Junxi, et al.
Published: (2026)

Semantics Disentanglement and Composition for Universal Image Coding with Efficiently LLM Reasoning and Generative Diffusion
by: Liu, Jinming, et al.
Published: (2024)

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
by: Chen, Guo, et al.
Published: (2024)

Flash-VStream: Efficient Real-Time Understanding for Long Video Streams
by: Zhang, Haoji, et al.
Published: (2025)

StreamingTOM: Streaming Token Compression for Efficient Video Understanding
by: Chen, Xueyi, et al.
Published: (2025)

AURA: Always-On Understanding and Real-Time Assistance via Video Streams
by: Lu, Xudong, et al.
Published: (2026)

VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers
by: Li, Ruanjun, et al.
Published: (2025)

A Coding Framework and Benchmark towards Low-Bitrate Video Understanding
by: Tian, Yuan, et al.
Published: (2022)