Saved in:
| Main Authors: | Liu, Jinming, Huang, Jianguo, Jia, Zhaoyang, Li, Jiahao, Zhang, Xiaoyi, Guo, Zongyu, Li, Bin, Zeng, Wenjun, Lu, Yan, Jin, Xin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.17921 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
by: Zhang, Xiaoyi, et al.
Published: (2025)
by: Zhang, Xiaoyi, et al.
Published: (2025)
Generative Latent Video Compression
by: Guo, Zongyu, et al.
Published: (2025)
by: Guo, Zongyu, et al.
Published: (2025)
When MLLMs Meet Compression Distortion: A Coding Paradigm Tailored to MLLMs
by: Liu, Jinming, et al.
Published: (2025)
by: Liu, Jinming, et al.
Published: (2025)
Generative Video Compression with One-Dimensional Latent Representation
by: Zheng, Zihan, et al.
Published: (2026)
by: Zheng, Zihan, et al.
Published: (2026)
CoD: A Diffusion Foundation Model for Image Compression
by: Jia, Zhaoyang, et al.
Published: (2025)
by: Jia, Zhaoyang, et al.
Published: (2025)
CoD-Lite: Real-Time Diffusion-Based Generative Image Compression
by: Jia, Zhaoyang, et al.
Published: (2026)
by: Jia, Zhaoyang, et al.
Published: (2026)
Generation Navigator: A State-Aware Agentic Framework for Image Generation
by: Liu, Jinming, et al.
Published: (2026)
by: Liu, Jinming, et al.
Published: (2026)
Efficient Autoregressive Video Diffusion with Dummy Head
by: Guo, Hang, et al.
Published: (2026)
by: Guo, Hang, et al.
Published: (2026)
InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training
by: Zhang, Ziyun, et al.
Published: (2026)
by: Zhang, Ziyun, et al.
Published: (2026)
Generative Latent Coding for Ultra-Low Bitrate Image and Video Compression
by: Qi, Linfeng, et al.
Published: (2025)
by: Qi, Linfeng, et al.
Published: (2025)
Towards Practical Real-Time Neural Video Compression
by: Jia, Zhaoyang, et al.
Published: (2025)
by: Jia, Zhaoyang, et al.
Published: (2025)
Single-step Diffusion-based Video Coding with Semantic-Temporal Guidance
by: Xue, Naifu, et al.
Published: (2025)
by: Xue, Naifu, et al.
Published: (2025)
Generative Latent Coding for Ultra-Low Bitrate Image Compression
by: Jia, Zhaoyang, et al.
Published: (2025)
by: Jia, Zhaoyang, et al.
Published: (2025)
Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
by: Li, Jialuo, et al.
Published: (2025)
by: Li, Jialuo, et al.
Published: (2025)
Neural Video Compression with Feature Modulation
by: Li, Jiahao, et al.
Published: (2024)
by: Li, Jiahao, et al.
Published: (2024)
One-Step Diffusion-Based Image Compression with Semantic Distillation
by: Xue, Naifu, et al.
Published: (2025)
by: Xue, Naifu, et al.
Published: (2025)
DLF: Extreme Image Compression with Dual-generative Latent Fusion
by: Xue, Naifu, et al.
Published: (2025)
by: Xue, Naifu, et al.
Published: (2025)
A Skill-augmented Agentic Framework and Benchmark for Multi-Video Understanding
by: Zhang, Yue, et al.
Published: (2026)
by: Zhang, Yue, et al.
Published: (2026)
Compression Tells Intelligence: Visual Coding, Visual Token Technology, and the Unification
by: Jin, Xin, et al.
Published: (2026)
by: Jin, Xin, et al.
Published: (2026)
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
by: Zeng, Xiangyu, et al.
Published: (2025)
by: Zeng, Xiangyu, et al.
Published: (2025)
Uncertainty-Aware Deep Video Compression with Ensembles
by: Ma, Wufei, et al.
Published: (2024)
by: Ma, Wufei, et al.
Published: (2024)
Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding
by: Huang, Yanxiang, et al.
Published: (2026)
by: Huang, Yanxiang, et al.
Published: (2026)
Agentic Video Intelligence: A Flexible Framework for Advanced Video Exploration and Understanding
by: Gao, Hong, et al.
Published: (2025)
by: Gao, Hong, et al.
Published: (2025)
Rate-Distortion-Cognition Controllable Versatile Neural Image Compression
by: Liu, Jinming, et al.
Published: (2024)
by: Liu, Jinming, et al.
Published: (2024)
LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval
by: Ning, Zhenyu, et al.
Published: (2025)
by: Ning, Zhenyu, et al.
Published: (2025)
Streaming Long Video Understanding with Large Language Models
by: Qian, Rui, et al.
Published: (2024)
by: Qian, Rui, et al.
Published: (2024)
One at a Time: Progressive Multi-step Volumetric Probability Learning for Reliable 3D Scene Perception
by: Li, Bohan, et al.
Published: (2023)
by: Li, Bohan, et al.
Published: (2023)
Agentic Very Long Video Understanding
by: Rege, Aniket, et al.
Published: (2026)
by: Rege, Aniket, et al.
Published: (2026)
MAL: Cluster-Masked and Multi-Task Pretraining for Enhanced xLSTM Vision Performance
by: Huang, Wenjun, et al.
Published: (2024)
by: Huang, Wenjun, et al.
Published: (2024)
Hierarchical Long Video Understanding with Audiovisual Entity Cohesion and Agentic Search
by: Yin, Xinlei, et al.
Published: (2026)
by: Yin, Xinlei, et al.
Published: (2026)
COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework
by: Dong, Xin, et al.
Published: (2024)
by: Dong, Xin, et al.
Published: (2024)
Closed-Loop Unsupervised Representation Disentanglement with $β$-VAE Distillation and Diffusion Probabilistic Feedback
by: Jin, Xin, et al.
Published: (2024)
by: Jin, Xin, et al.
Published: (2024)
StreamMeCo: Long-Term Agent Memory Compression for Efficient Streaming Video Understanding
by: Wang, Junxi, et al.
Published: (2026)
by: Wang, Junxi, et al.
Published: (2026)
Semantics Disentanglement and Composition for Universal Image Coding with Efficiently LLM Reasoning and Generative Diffusion
by: Liu, Jinming, et al.
Published: (2024)
by: Liu, Jinming, et al.
Published: (2024)
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
by: Chen, Guo, et al.
Published: (2024)
by: Chen, Guo, et al.
Published: (2024)
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams
by: Zhang, Haoji, et al.
Published: (2025)
by: Zhang, Haoji, et al.
Published: (2025)
StreamingTOM: Streaming Token Compression for Efficient Video Understanding
by: Chen, Xueyi, et al.
Published: (2025)
by: Chen, Xueyi, et al.
Published: (2025)
AURA: Always-On Understanding and Real-Time Assistance via Video Streams
by: Lu, Xudong, et al.
Published: (2026)
by: Lu, Xudong, et al.
Published: (2026)
VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers
by: Li, Ruanjun, et al.
Published: (2025)
by: Li, Ruanjun, et al.
Published: (2025)
A Coding Framework and Benchmark towards Low-Bitrate Video Understanding
by: Tian, Yuan, et al.
Published: (2022)
by: Tian, Yuan, et al.
Published: (2022)
Similar Items
-
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
by: Zhang, Xiaoyi, et al.
Published: (2025) -
Generative Latent Video Compression
by: Guo, Zongyu, et al.
Published: (2025) -
When MLLMs Meet Compression Distortion: A Coding Paradigm Tailored to MLLMs
by: Liu, Jinming, et al.
Published: (2025) -
Generative Video Compression with One-Dimensional Latent Representation
by: Zheng, Zihan, et al.
Published: (2026) -
CoD: A Diffusion Foundation Model for Image Compression
by: Jia, Zhaoyang, et al.
Published: (2025)