Saved in:
| Main Authors: | Luo, Wen, Chen, Peng, Huang, Xiaotao, Huang, LiQun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.17818 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
by: You, Haoran, et al.
Published: (2022)
by: You, Haoran, et al.
Published: (2022)
ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models
by: Zhang, Yongheng, et al.
Published: (2025)
by: Zhang, Yongheng, et al.
Published: (2025)
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation
by: Chen, Hanning, et al.
Published: (2025)
by: Chen, Hanning, et al.
Published: (2025)
PruneVid: Visual Token Pruning for Efficient Video Large Language Models
by: Huang, Xiaohu, et al.
Published: (2024)
by: Huang, Xiaohu, et al.
Published: (2024)
Short-LVLM: Compressing and Accelerating Large Vision-Language Models by Pruning Redundant Layers
by: Ma, Ji, et al.
Published: (2025)
by: Ma, Ji, et al.
Published: (2025)
Exploring Textual Semantics Diversity for Image Transmission in Semantic Communication Systems using Visual Language Model
by: Huang, Peishan, et al.
Published: (2025)
by: Huang, Peishan, et al.
Published: (2025)
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
by: Yue, Tongtian, et al.
Published: (2025)
by: Yue, Tongtian, et al.
Published: (2025)
CoViPAL: Layer-wise Contextualized Visual Token Pruning for Large Vision-Language Models
by: Tang, Zicong, et al.
Published: (2025)
by: Tang, Zicong, et al.
Published: (2025)
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration
by: Endo, Mark, et al.
Published: (2024)
by: Endo, Mark, et al.
Published: (2024)
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models
by: Zeng, Quan-Sheng, et al.
Published: (2025)
by: Zeng, Quan-Sheng, et al.
Published: (2025)
Collaborative Multi-Mode Pruning for Vision-Language Models
by: Wu, Zimeng, et al.
Published: (2026)
by: Wu, Zimeng, et al.
Published: (2026)
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
by: Yan, Siming, et al.
Published: (2024)
by: Yan, Siming, et al.
Published: (2024)
ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models
by: Liu, Yuqi, et al.
Published: (2025)
by: Liu, Yuqi, et al.
Published: (2025)
Multi-Cue Adaptive Visual Token Pruning for Large Vision-Language Models
by: Luan, Bozhi, et al.
Published: (2025)
by: Luan, Bozhi, et al.
Published: (2025)
Topology-Aware Layer Pruning for Large Vision-Language Models
by: Zheng, Pengcheng, et al.
Published: (2026)
by: Zheng, Pengcheng, et al.
Published: (2026)
EntropyPrune: Matrix Entropy Guided Visual Token Pruning for Multimodal Large Language Models
by: Wang, Yahong, et al.
Published: (2026)
by: Wang, Yahong, et al.
Published: (2026)
How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?
by: Lee, Yujian, et al.
Published: (2026)
by: Lee, Yujian, et al.
Published: (2026)
Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues
by: Feng, X., et al.
Published: (2024)
by: Feng, X., et al.
Published: (2024)
GreedyPrune: Retenting Critical Visual Token Set for Large Vision Language Models
by: Pei, Ruiguang, et al.
Published: (2025)
by: Pei, Ruiguang, et al.
Published: (2025)
Evading Visual Aphasia: Contrastive Adaptive Semantic Token Pruning for Vision-Language Models
by: Ma, Jie, et al.
Published: (2026)
by: Ma, Jie, et al.
Published: (2026)
Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings
by: Agrawal, Aakriti, et al.
Published: (2025)
by: Agrawal, Aakriti, et al.
Published: (2025)
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
by: Kang, Jialiang, et al.
Published: (2025)
by: Kang, Jialiang, et al.
Published: (2025)
INTERLACE: Interleaved Layer Pruning and Efficient Adaptation in Large Vision-Language Models
by: Madinei, Parsa, et al.
Published: (2025)
by: Madinei, Parsa, et al.
Published: (2025)
FoPru: Focal Pruning for Efficient Large Vision-Language Models
by: Jiang, Lei, et al.
Published: (2024)
by: Jiang, Lei, et al.
Published: (2024)
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
by: Xing, Long, et al.
Published: (2024)
by: Xing, Long, et al.
Published: (2024)
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models
by: Zeng, Yu, et al.
Published: (2026)
by: Zeng, Yu, et al.
Published: (2026)
RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning
by: Xu, Jingqi, et al.
Published: (2025)
by: Xu, Jingqi, et al.
Published: (2025)
IKOD: Mitigating Visual Attention Degradation in Large Vision-Language Models
by: Yang, Jiabing, et al.
Published: (2025)
by: Yang, Jiabing, et al.
Published: (2025)
METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models
by: Liu, Yuchen, et al.
Published: (2025)
by: Liu, Yuchen, et al.
Published: (2025)
ViTOC: Vision Transformer and Object-aware Captioner
by: Huang, Feiyang
Published: (2024)
by: Huang, Feiyang
Published: (2024)
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
by: Zeng, Weili, et al.
Published: (2025)
by: Zeng, Weili, et al.
Published: (2025)
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
by: Cao, Jianjian, et al.
Published: (2024)
by: Cao, Jianjian, et al.
Published: (2024)
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding
by: Zheng, Henry, et al.
Published: (2025)
by: Zheng, Henry, et al.
Published: (2025)
Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models
by: Suo, Wei, et al.
Published: (2024)
by: Suo, Wei, et al.
Published: (2024)
Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment
by: Liu, Yang, et al.
Published: (2025)
by: Liu, Yang, et al.
Published: (2025)
Prune2Drive: A Plug-and-Play Framework for Accelerating Vision-Language Models in Autonomous Driving
by: Xiong, Minhao, et al.
Published: (2025)
by: Xiong, Minhao, et al.
Published: (2025)
LaViC: Adapting Large Vision-Language Models to Visually-Aware Conversational Recommendation
by: Jeon, Hyunsik, et al.
Published: (2025)
by: Jeon, Hyunsik, et al.
Published: (2025)
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
by: Jin, Peng, et al.
Published: (2023)
by: Jin, Peng, et al.
Published: (2023)
PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models
by: Meng, Yu, et al.
Published: (2025)
by: Meng, Yu, et al.
Published: (2025)
SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning
by: Wang, Hanzhen, et al.
Published: (2025)
by: Wang, Hanzhen, et al.
Published: (2025)
Similar Items
-
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
by: You, Haoran, et al.
Published: (2022) -
ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models
by: Zhang, Yongheng, et al.
Published: (2025) -
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation
by: Chen, Hanning, et al.
Published: (2025) -
PruneVid: Visual Token Pruning for Efficient Video Large Language Models
by: Huang, Xiaohu, et al.
Published: (2024) -
Short-LVLM: Compressing and Accelerating Large Vision-Language Models by Pruning Redundant Layers
by: Ma, Ji, et al.
Published: (2025)