Saved in:
| Main Authors: | Li, Yangfu, Zhan, Hongjian, Chen, Tianyi, Liu, Qi, Lu, Yue |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.10118 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DeepScan: A Training-Free Framework for Visually Grounded Reasoning in Large Vision-Language Models
by: Li, Yangfu, et al.
Published: (2026)
by: Li, Yangfu, et al.
Published: (2026)
Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
by: Ye, Weihao, et al.
Published: (2024)
by: Ye, Weihao, et al.
Published: (2024)
Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
by: Bai, Tianyi, et al.
Published: (2025)
by: Bai, Tianyi, et al.
Published: (2025)
PixelPrune: Pixel-Level Adaptive Visual Token Reduction via Predictive Coding
by: Wang, Nan, et al.
Published: (2026)
by: Wang, Nan, et al.
Published: (2026)
UTPTrack: Towards Simple and Unified Token Pruning for Visual Tracking
by: Wu, Hao, et al.
Published: (2026)
by: Wu, Hao, et al.
Published: (2026)
The Better You Learn, The Smarter You Prune: Towards Efficient Vision-language-action Models via Differentiable Token Pruning
by: Jiang, Titong, et al.
Published: (2025)
by: Jiang, Titong, et al.
Published: (2025)
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
by: Zhong, Yiwu, et al.
Published: (2024)
by: Zhong, Yiwu, et al.
Published: (2024)
Efficient Whole Slide Pathology VQA via Token Compression
by: Lyu, Weimin, et al.
Published: (2025)
by: Lyu, Weimin, et al.
Published: (2025)
v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning
by: Chung, Jiwan, et al.
Published: (2025)
by: Chung, Jiwan, et al.
Published: (2025)
Nüwa: Mending the Spatial Integrity Torn by VLM Token Pruning
by: Huang, Yihong, et al.
Published: (2026)
by: Huang, Yihong, et al.
Published: (2026)
Efficient Vision-Language Reasoning via Adaptive Token Pruning
by: Li, Xue, et al.
Published: (2025)
by: Li, Xue, et al.
Published: (2025)
Caption First, VQA Second: Knowledge Density, Not Task Format, Drives Multimodal Scaling
by: Zou, Hongjian, et al.
Published: (2026)
by: Zou, Hongjian, et al.
Published: (2026)
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
by: Zhang, Qizhe, et al.
Published: (2024)
by: Zhang, Qizhe, et al.
Published: (2024)
HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models
by: Guo, Yansong, et al.
Published: (2026)
by: Guo, Yansong, et al.
Published: (2026)
Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
by: Wen, Zichen, et al.
Published: (2025)
by: Wen, Zichen, et al.
Published: (2025)
HiDrop: Hierarchical Vision Token Reduction in MLLMs via Late Injection, Concave Pyramid Pruning, and Early Exit
by: Wu, Hao, et al.
Published: (2026)
by: Wu, Hao, et al.
Published: (2026)
CoViPAL: Layer-wise Contextualized Visual Token Pruning for Large Vision-Language Models
by: Tang, Zicong, et al.
Published: (2025)
by: Tang, Zicong, et al.
Published: (2025)
DynTok: Dynamic Compression of Visual Tokens for Efficient and Effective Video Understanding
by: Zhang, Hongzhi, et al.
Published: (2025)
by: Zhang, Hongzhi, et al.
Published: (2025)
ToDRE: Effective Visual Token Pruning via Token Diversity and Task Relevance
by: Li, Duo, et al.
Published: (2025)
by: Li, Duo, et al.
Published: (2025)
Perceptual Flow Network for Visually Grounded Reasoning
by: Li, Yangfu, et al.
Published: (2026)
by: Li, Yangfu, et al.
Published: (2026)
FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing
by: Tang, Zihan, et al.
Published: (2026)
by: Tang, Zihan, et al.
Published: (2026)
Faithful-MR1: Faithful Multimodal Reasoning via Anchoring and Reinforcing Visual Attention
by: Tian, Changyuan, et al.
Published: (2026)
by: Tian, Changyuan, et al.
Published: (2026)
SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
by: Ji, Yicheng, et al.
Published: (2025)
by: Ji, Yicheng, et al.
Published: (2025)
See the Text: From Tokenization to Visual Reading
by: Xing, Ling, et al.
Published: (2025)
by: Xing, Ling, et al.
Published: (2025)
Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework
by: Yan, Yibo, et al.
Published: (2026)
by: Yan, Yibo, et al.
Published: (2026)
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
by: Song, Wei, et al.
Published: (2025)
by: Song, Wei, et al.
Published: (2025)
Efficient Document Parsing via Parallel Token Prediction
by: Li, Lei, et al.
Published: (2026)
by: Li, Lei, et al.
Published: (2026)
Why and When Visual Token Pruning Fails? A Study on Relevant Visual Information Shift in MLLMs Decoding
by: Kim, Jiwan, et al.
Published: (2026)
by: Kim, Jiwan, et al.
Published: (2026)
Structural Anchor Pruning: Training-Free Multi-Vector Compression for Visual Document Retrieval
by: Liu, Zhuchenyang, et al.
Published: (2026)
by: Liu, Zhuchenyang, et al.
Published: (2026)
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers
by: Yuan, Qianhao, et al.
Published: (2025)
by: Yuan, Qianhao, et al.
Published: (2025)
Breaking through Deterministic Barriers: Randomized Pruning Mask Generation and Selection
by: Li, Jianwei, et al.
Published: (2023)
by: Li, Jianwei, et al.
Published: (2023)
BabyVision: Visual Reasoning Beyond Language
by: Chen, Liang, et al.
Published: (2026)
by: Chen, Liang, et al.
Published: (2026)
Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization
by: Li, Kaiyuan, et al.
Published: (2025)
by: Li, Kaiyuan, et al.
Published: (2025)
EVTP-IVS: Effective Visual Token Pruning For Unifying Instruction Visual Segmentation In Multi-Modal Large Language Models
by: Zhu, Wenhui, et al.
Published: (2025)
by: Zhu, Wenhui, et al.
Published: (2025)
Unveil: Unified Visual-Textual Integration and Distillation for Multi-modal Document Retrieval
by: Sun, Hao, et al.
Published: (2026)
by: Sun, Hao, et al.
Published: (2026)
Agri-R1: Agricultural Reasoning for Disease Diagnosis via Automated-Synthesis and Reinforcement Learning
by: Zhang, Wentao, et al.
Published: (2026)
by: Zhang, Wentao, et al.
Published: (2026)
Index-Preserving Lightweight Token Pruning for Efficient Document Understanding in Vision-Language Models
by: Son, Jaemin, et al.
Published: (2025)
by: Son, Jaemin, et al.
Published: (2025)
Why Instruction-Based Unlearning Fails in Diffusion Models?
by: Zhang, Zeliang, et al.
Published: (2026)
by: Zhang, Zeliang, et al.
Published: (2026)
ERASE: Eliminating Redundant Visual Tokens via Adaptive Two-Stage Token Pruning
by: Lee, Yuna, et al.
Published: (2026)
by: Lee, Yuna, et al.
Published: (2026)
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
by: Zhang, Qizhe, et al.
Published: (2025)
by: Zhang, Qizhe, et al.
Published: (2025)
Similar Items
-
DeepScan: A Training-Free Framework for Visually Grounded Reasoning in Large Vision-Language Models
by: Li, Yangfu, et al.
Published: (2026) -
Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
by: Ye, Weihao, et al.
Published: (2024) -
Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
by: Bai, Tianyi, et al.
Published: (2025) -
PixelPrune: Pixel-Level Adaptive Visual Token Reduction via Predictive Coding
by: Wang, Nan, et al.
Published: (2026) -
UTPTrack: Towards Simple and Unified Token Pruning for Visual Tracking
by: Wu, Hao, et al.
Published: (2026)