Saved in:
| Main Authors: | Huang, Haifeng, Li, Yang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.03414 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
InfoMerge: Information-aware Token Compression for Efficient Video Large Language Models
by: Liu, Xinxin, et al.
Published: (2026)
by: Liu, Xinxin, et al.
Published: (2026)
Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
by: Wang, Yiyu, et al.
Published: (2025)
by: Wang, Yiyu, et al.
Published: (2025)
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
by: Tao, Keda, et al.
Published: (2024)
by: Tao, Keda, et al.
Published: (2024)
TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models
by: Tan, Xudong, et al.
Published: (2025)
by: Tan, Xudong, et al.
Published: (2025)
Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models
by: Cao, Sihan, et al.
Published: (2026)
by: Cao, Sihan, et al.
Published: (2026)
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
by: Yang, Chenyu, et al.
Published: (2024)
by: Yang, Chenyu, et al.
Published: (2024)
Variation-aware Vision Token Dropping for Faster Large Vision-Language Models
by: Chen, Junjie, et al.
Published: (2025)
by: Chen, Junjie, et al.
Published: (2025)
Contribution-aware Token Compression for Efficient Video Understanding via Reinforcement Learning
by: Ma, Yinchao, et al.
Published: (2026)
by: Ma, Yinchao, et al.
Published: (2026)
PPE: Positional Preservation Embedding for Token Compression in Multimodal Large Language Models
by: Huang, Mouxiao, et al.
Published: (2025)
by: Huang, Mouxiao, et al.
Published: (2025)
A Survey of Token Compression for Efficient Multimodal Large Language Models
by: Shao, Kele, et al.
Published: (2025)
by: Shao, Kele, et al.
Published: (2025)
Vision-centric Token Compression in Large Language Model
by: Xing, Ling, et al.
Published: (2025)
by: Xing, Ling, et al.
Published: (2025)
OTT-Vid: Optimal Transport Temporal Token Compression for Video Large Language Models
by: Kang, Minseok, et al.
Published: (2026)
by: Kang, Minseok, et al.
Published: (2026)
UniCompress: Token Compression for Unified Vision-Language Understanding and Generation
by: Wang, Ziyao, et al.
Published: (2026)
by: Wang, Ziyao, et al.
Published: (2026)
PruneVid: Visual Token Pruning for Efficient Video Large Language Models
by: Huang, Xiaohu, et al.
Published: (2024)
by: Huang, Xiaohu, et al.
Published: (2024)
OmniSelect: Dynamic Modality-Aware Token Compression for Efficient Omni-modal Large Language Models
by: Yang, Morunliu, et al.
Published: (2026)
by: Yang, Morunliu, et al.
Published: (2026)
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
by: Liu, Zhihang, et al.
Published: (2025)
by: Liu, Zhihang, et al.
Published: (2025)
Aligning Effective Tokens with Video Anomaly in Large Language Models
by: Chen, Yingxian, et al.
Published: (2025)
by: Chen, Yingxian, et al.
Published: (2025)
DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models
by: Li, Yizhuo, et al.
Published: (2024)
by: Li, Yizhuo, et al.
Published: (2024)
Rate-aware Compression for NeRF-based Volumetric Video
by: Zhang, Zhiyu, et al.
Published: (2024)
by: Zhang, Zhiyu, et al.
Published: (2024)
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
by: Liu, Xiangrui, et al.
Published: (2025)
by: Liu, Xiangrui, et al.
Published: (2025)
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models
by: Zeng, Quan-Sheng, et al.
Published: (2025)
by: Zeng, Quan-Sheng, et al.
Published: (2025)
HoliTom: Holistic Token Merging for Fast Video Large Language Models
by: Shao, Kele, et al.
Published: (2025)
by: Shao, Kele, et al.
Published: (2025)
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
by: Yao, Linli, et al.
Published: (2024)
by: Yao, Linli, et al.
Published: (2024)
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
by: Wang, Han, et al.
Published: (2024)
by: Wang, Han, et al.
Published: (2024)
Task-Related Token Compression in Multimodal Large Language Models from an Explainability Perspective
by: Lei, Lei, et al.
Published: (2025)
by: Lei, Lei, et al.
Published: (2025)
A Refer-and-Ground Multimodal Large Language Model for Biomedicine
by: Huang, Xiaoshuang, et al.
Published: (2024)
by: Huang, Xiaoshuang, et al.
Published: (2024)
Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models
by: Peng, Tianfan, et al.
Published: (2025)
by: Peng, Tianfan, et al.
Published: (2025)
Geometry-Guided 3D Visual Token Pruning for Video-Language Models
by: Li, Han, et al.
Published: (2026)
by: Li, Han, et al.
Published: (2026)
StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models
by: Guo, Yuxiang, et al.
Published: (2024)
by: Guo, Yuxiang, et al.
Published: (2024)
GroundVTS: Visual Token Sampling in Multimodal Large Language Models for Video Temporal Grounding
by: Fan, Rong, et al.
Published: (2026)
by: Fan, Rong, et al.
Published: (2026)
Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models
by: Liu, Xuyang, et al.
Published: (2025)
by: Liu, Xuyang, et al.
Published: (2025)
LaCo: Efficient Layer-wise Compression of Visual Tokens for Multimodal Large Language Models
by: Liu, Juntao, et al.
Published: (2025)
by: Liu, Juntao, et al.
Published: (2025)
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
by: Tao, Keda, et al.
Published: (2025)
by: Tao, Keda, et al.
Published: (2025)
Token Reduction via Local and Global Contexts Optimization for Efficient Video Large Language Models
by: Li, Jinlong, et al.
Published: (2026)
by: Li, Jinlong, et al.
Published: (2026)
FCoT-VL:Advancing Text-oriented Large Vision-Language Models with Efficient Visual Token Compression
by: Li, Jianjian, et al.
Published: (2025)
by: Li, Jianjian, et al.
Published: (2025)
EvoCut: Multi-Layer Evolution-Aware Visual Token Compression for Efficient Large Vision-Language Models
by: Lu, Hongyu, et al.
Published: (2026)
by: Lu, Hongyu, et al.
Published: (2026)
Unified Spatiotemporal Token Compression for Video-LLMs at Ultra-Low Retention
by: Du, Junhao, et al.
Published: (2026)
by: Du, Junhao, et al.
Published: (2026)
METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding
by: Wang, Mengyue, et al.
Published: (2025)
by: Wang, Mengyue, et al.
Published: (2025)
VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models
by: Lan, Xiaohan, et al.
Published: (2024)
by: Lan, Xiaohan, et al.
Published: (2024)
A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models
by: Li, Duo, et al.
Published: (2025)
by: Li, Duo, et al.
Published: (2025)
Similar Items
-
InfoMerge: Information-aware Token Compression for Efficient Video Large Language Models
by: Liu, Xinxin, et al.
Published: (2026) -
Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
by: Wang, Yiyu, et al.
Published: (2025) -
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
by: Tao, Keda, et al.
Published: (2024) -
TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models
by: Tan, Xudong, et al.
Published: (2025) -
Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models
by: Cao, Sihan, et al.
Published: (2026)