Saved in:
| Main Authors: | Yang, Zhongyu, Xu, Dannong, Pang, Wei, Yuan, Yingfang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.01949 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration
by: Yang, Zhongyu, et al.
Published: (2025)
by: Yang, Zhongyu, et al.
Published: (2025)
MultiHaystack: Benchmarking Multimodal Retrieval and Reasoning over 40K Images, Videos, and Documents
by: Xu, Dannong, et al.
Published: (2026)
by: Xu, Dannong, et al.
Published: (2026)
SVAgent: Storyline-Guided Long Video Understanding via Cross-Modal Multi-Agent Collaboration
by: Yang, Zhongyu, et al.
Published: (2026)
by: Yang, Zhongyu, et al.
Published: (2026)
EntropyPrune: Matrix Entropy Guided Visual Token Pruning for Multimodal Large Language Models
by: Wang, Yahong, et al.
Published: (2026)
by: Wang, Yahong, et al.
Published: (2026)
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
by: Lee, Jaewoo, et al.
Published: (2025)
by: Lee, Jaewoo, et al.
Published: (2025)
TrimTokenator: Towards Adaptive Visual Token Pruning for Large Multimodal Models
by: Zhang, Hao, et al.
Published: (2025)
by: Zhang, Hao, et al.
Published: (2025)
Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
by: Wen, Zichen, et al.
Published: (2025)
by: Wen, Zichen, et al.
Published: (2025)
PruneVid: Visual Token Pruning for Efficient Video Large Language Models
by: Huang, Xiaohu, et al.
Published: (2024)
by: Huang, Xiaohu, et al.
Published: (2024)
QAPruner: Quantization-Aware Vision Token Pruning for Multimodal Large Language Models
by: Wang, Xinhao, et al.
Published: (2026)
by: Wang, Xinhao, et al.
Published: (2026)
WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation
by: Yang, Zhongyu, et al.
Published: (2025)
by: Yang, Zhongyu, et al.
Published: (2025)
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
by: Yao, Linli, et al.
Published: (2024)
by: Yao, Linli, et al.
Published: (2024)
TrimTokenator-LC: Towards Adaptive Visual Token Pruning for Large Multimodal Models with Long Contexts
by: Zhang, Hao, et al.
Published: (2025)
by: Zhang, Hao, et al.
Published: (2025)
Attention Debiasing for Token Pruning in Vision Language Models
by: Zhao, Kai, et al.
Published: (2025)
by: Zhao, Kai, et al.
Published: (2025)
Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study
by: Huang, Yiran, et al.
Published: (2025)
by: Huang, Yiran, et al.
Published: (2025)
HAWK: Head Importance-Aware Visual Token Pruning in Multimodal Models
by: Zhu, Qihui, et al.
Published: (2026)
by: Zhu, Qihui, et al.
Published: (2026)
TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model
by: Li, Ao, et al.
Published: (2025)
by: Li, Ao, et al.
Published: (2025)
IWP: Token Pruning as Implicit Weight Pruning in Large Vision Language Models
by: Lee, Dong-Jae, et al.
Published: (2026)
by: Lee, Dong-Jae, et al.
Published: (2026)
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
by: Alvar, Saeed Ranjbar, et al.
Published: (2025)
by: Alvar, Saeed Ranjbar, et al.
Published: (2025)
Fast-Slow Efficient Training for Multimodal Large Language Models via Visual Token Pruning
by: Zhang, Dingkun, et al.
Published: (2026)
by: Zhang, Dingkun, et al.
Published: (2026)
Specializing Large Models for Oracle Bone Script Interpretation via Component-Grounded Multimodal Knowledge Augmentation
by: Zhang, Jianing, et al.
Published: (2026)
by: Zhang, Jianing, et al.
Published: (2026)
Pyramid Token Pruning for High-Resolution Large Vision-Language Models via Region, Token, and Instruction-Guided Importance
by: Liang, Yuxuan, et al.
Published: (2025)
by: Liang, Yuxuan, et al.
Published: (2025)
Less is More: Token-Efficient Video-QA via Adaptive Frame-Pruning and Semantic Graph Integration
by: Wang, Shaoguang, et al.
Published: (2025)
by: Wang, Shaoguang, et al.
Published: (2025)
CATP: Contextually Adaptive Token Pruning for Efficient and Enhanced Multimodal In-Context Learning
by: Li, Yanshu, et al.
Published: (2025)
by: Li, Yanshu, et al.
Published: (2025)
Semantic Alignment for Multimodal Large Language Models
by: Wu, Tao, et al.
Published: (2024)
by: Wu, Tao, et al.
Published: (2024)
Multi-Cue Adaptive Visual Token Pruning for Large Vision-Language Models
by: Luan, Bozhi, et al.
Published: (2025)
by: Luan, Bozhi, et al.
Published: (2025)
Decoupled Similarity for Task-Aware Token Pruning in Large Vision-Language Models
by: Ma, Kexin, et al.
Published: (2026)
by: Ma, Kexin, et al.
Published: (2026)
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
by: Ye, Xubing, et al.
Published: (2024)
by: Ye, Xubing, et al.
Published: (2024)
TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models
by: Tan, Xudong, et al.
Published: (2025)
by: Tan, Xudong, et al.
Published: (2025)
Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models
by: Li, Siyou, et al.
Published: (2025)
by: Li, Siyou, et al.
Published: (2025)
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
by: Yang, Sihan, et al.
Published: (2025)
by: Yang, Sihan, et al.
Published: (2025)
HiPrune: Hierarchical Attention for Efficient Token Pruning in Vision-Language Models
by: Liu, Jizhihui, et al.
Published: (2025)
by: Liu, Jizhihui, et al.
Published: (2025)
FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning
by: Xia, Guoyang, et al.
Published: (2025)
by: Xia, Guoyang, et al.
Published: (2025)
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
by: Yang, Cheng, et al.
Published: (2025)
by: Yang, Cheng, et al.
Published: (2025)
MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
by: Chow, Wei, et al.
Published: (2025)
by: Chow, Wei, et al.
Published: (2025)
Grounding Everything in Tokens for Multimodal Large Language Models
by: Ren, Xiangxuan, et al.
Published: (2025)
by: Ren, Xiangxuan, et al.
Published: (2025)
HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models
by: Guo, Yansong, et al.
Published: (2026)
by: Guo, Yansong, et al.
Published: (2026)
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models
by: Zeng, Quan-Sheng, et al.
Published: (2025)
by: Zeng, Quan-Sheng, et al.
Published: (2025)
GreedyPrune: Retenting Critical Visual Token Set for Large Vision Language Models
by: Pei, Ruiguang, et al.
Published: (2025)
by: Pei, Ruiguang, et al.
Published: (2025)
What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph
by: Jiang, Yutao, et al.
Published: (2025)
by: Jiang, Yutao, et al.
Published: (2025)
Beyond Surrogate Gradients: Fully Differentiable Token Pruning for Vision-Language Models
by: He, Landi, et al.
Published: (2026)
by: He, Landi, et al.
Published: (2026)
Similar Items
-
InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration
by: Yang, Zhongyu, et al.
Published: (2025) -
MultiHaystack: Benchmarking Multimodal Retrieval and Reasoning over 40K Images, Videos, and Documents
by: Xu, Dannong, et al.
Published: (2026) -
SVAgent: Storyline-Guided Long Video Understanding via Cross-Modal Multi-Agent Collaboration
by: Yang, Zhongyu, et al.
Published: (2026) -
EntropyPrune: Matrix Entropy Guided Visual Token Pruning for Multimodal Large Language Models
by: Wang, Yahong, et al.
Published: (2026) -
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
by: Lee, Jaewoo, et al.
Published: (2025)