Saved in:
| Main Authors: | Li, Xu, Liang, Yuxuan, Chen, Xiaolei, Zheng, Yi, Chen, Haotian, Li, Bin, Xue, Xiangyang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.13067 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Pyramid Token Pruning for High-Resolution Large Vision-Language Models via Region, Token, and Instruction-Guided Importance
by: Liang, Yuxuan, et al.
Published: (2025)
by: Liang, Yuxuan, et al.
Published: (2025)
Global Semantic-Guided Sub-image Feature Weight Allocation in High-Resolution Large Vision-Language Models
by: Liang, Yuxuan, et al.
Published: (2025)
by: Liang, Yuxuan, et al.
Published: (2025)
Instruction-Guided Fusion of Multi-Layer Visual Features in Large Vision-Language Models
by: Li, Xu, et al.
Published: (2024)
by: Li, Xu, et al.
Published: (2024)
ResPrune: Text-Conditioned Subspace Reconstruction for Visual Token Pruning in Large Vision-Language Models
by: Li, Xu, et al.
Published: (2026)
by: Li, Xu, et al.
Published: (2026)
When Large Vision-Language Models Meet Person Re-Identification
by: Wang, Qizao, et al.
Published: (2024)
by: Wang, Qizao, et al.
Published: (2024)
Variation-aware Vision Token Dropping for Faster Large Vision-Language Models
by: Chen, Junjie, et al.
Published: (2025)
by: Chen, Junjie, et al.
Published: (2025)
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models
by: Arif, Kazi Hasan Ibn, et al.
Published: (2024)
by: Arif, Kazi Hasan Ibn, et al.
Published: (2024)
TinyDrop: Tiny Model Guided Token Dropping for Vision Transformers
by: Wang, Guoxin, et al.
Published: (2025)
by: Wang, Guoxin, et al.
Published: (2025)
Rethinking Token Reduction for Large Vision-Language Models
by: Wang, Yi, et al.
Published: (2026)
by: Wang, Yi, et al.
Published: (2026)
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
by: Zhang, Ce, et al.
Published: (2025)
by: Zhang, Ce, et al.
Published: (2025)
Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model
by: Liu, Ting, et al.
Published: (2024)
by: Liu, Ting, et al.
Published: (2024)
Multi-Cue Adaptive Visual Token Pruning for Large Vision-Language Models
by: Luan, Bozhi, et al.
Published: (2025)
by: Luan, Bozhi, et al.
Published: (2025)
Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
by: Shi, Fan, et al.
Published: (2025)
by: Shi, Fan, et al.
Published: (2025)
REAR: Rethinking Visual Autoregressive Models via Generator-Tokenizer Consistency Regularization
by: He, Qiyuan, et al.
Published: (2025)
by: He, Qiyuan, et al.
Published: (2025)
FlexAttention for Efficient High-Resolution Vision-Language Models
by: Li, Junyan, et al.
Published: (2024)
by: Li, Junyan, et al.
Published: (2024)
Visual-Advantage On-Policy Distillation for Vision-Language Models
by: Liu, Ruiqi, et al.
Published: (2026)
by: Liu, Ruiqi, et al.
Published: (2026)
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
by: Luo, Junwei, et al.
Published: (2025)
by: Luo, Junwei, et al.
Published: (2025)
RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance
by: Fan, Jiaojiao, et al.
Published: (2024)
by: Fan, Jiaojiao, et al.
Published: (2024)
Point-It-Out: Benchmarking Embodied Reasoning for Vision Language Models in Multi-Stage Visual Grounding
by: Xue, Haotian, et al.
Published: (2025)
by: Xue, Haotian, et al.
Published: (2025)
Mitigating Cache Noise in Test-Time Adaptation for Large Vision-Language Models
by: Zhai, Haotian, et al.
Published: (2025)
by: Zhai, Haotian, et al.
Published: (2025)
HERO: Hierarchical Extrapolation and Refresh for Efficient World Models
by: Song, Quanjian, et al.
Published: (2025)
by: Song, Quanjian, et al.
Published: (2025)
PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models
by: Meng, Yu, et al.
Published: (2025)
by: Meng, Yu, et al.
Published: (2025)
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
by: Jin, Yang, et al.
Published: (2023)
by: Jin, Yang, et al.
Published: (2023)
LearnPruner: Rethinking Attention-based Token Pruning in Vision Language Models
by: Takezoe, Rinyoichi, et al.
Published: (2026)
by: Takezoe, Rinyoichi, et al.
Published: (2026)
Window Token Concatenation for Efficient Visual Large Language Models
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
Efficient Multi-modal Large Language Models via Visual Token Grouping
by: Huang, Minbin, et al.
Published: (2024)
by: Huang, Minbin, et al.
Published: (2024)
Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models
by: Niu, Junbo, et al.
Published: (2025)
by: Niu, Junbo, et al.
Published: (2025)
Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge
by: Lin, Yuanze, et al.
Published: (2024)
by: Lin, Yuanze, et al.
Published: (2024)
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models
by: Zeng, Yu, et al.
Published: (2026)
by: Zeng, Yu, et al.
Published: (2026)
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models
by: Zeng, Quan-Sheng, et al.
Published: (2025)
by: Zeng, Quan-Sheng, et al.
Published: (2025)
On the Adversarial Robustness of Large Vision-Language Models under Visual Token Compression
by: Zhang, Xinwei, et al.
Published: (2026)
by: Zhang, Xinwei, et al.
Published: (2026)
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
by: Zhao, Qinyu, et al.
Published: (2024)
by: Zhao, Qinyu, et al.
Published: (2024)
Decoupled Similarity for Task-Aware Token Pruning in Large Vision-Language Models
by: Ma, Kexin, et al.
Published: (2026)
by: Ma, Kexin, et al.
Published: (2026)
HERO-VQL: Hierarchical, Egocentric and Robust Visual Query Localization
by: Chang, Joohyun, et al.
Published: (2025)
by: Chang, Joohyun, et al.
Published: (2025)
HERO: Human Reaction Generation from Videos
by: Yu, Chengjun, et al.
Published: (2025)
by: Yu, Chengjun, et al.
Published: (2025)
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
by: Xing, Long, et al.
Published: (2024)
by: Xing, Long, et al.
Published: (2024)
Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization
by: Li, Kaiyuan, et al.
Published: (2025)
by: Li, Kaiyuan, et al.
Published: (2025)
ChatReID: Open-ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models
by: Niu, Ke, et al.
Published: (2025)
by: Niu, Ke, et al.
Published: (2025)
Task-Aware Resolution Optimization for Visual Large Language Models
by: Luo, Weiqing, et al.
Published: (2025)
by: Luo, Weiqing, et al.
Published: (2025)
Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models
by: Li, Ling, et al.
Published: (2025)
by: Li, Ling, et al.
Published: (2025)
Similar Items
-
Pyramid Token Pruning for High-Resolution Large Vision-Language Models via Region, Token, and Instruction-Guided Importance
by: Liang, Yuxuan, et al.
Published: (2025) -
Global Semantic-Guided Sub-image Feature Weight Allocation in High-Resolution Large Vision-Language Models
by: Liang, Yuxuan, et al.
Published: (2025) -
Instruction-Guided Fusion of Multi-Layer Visual Features in Large Vision-Language Models
by: Li, Xu, et al.
Published: (2024) -
ResPrune: Text-Conditioned Subspace Reconstruction for Visual Token Pruning in Large Vision-Language Models
by: Li, Xu, et al.
Published: (2026) -
When Large Vision-Language Models Meet Person Re-Identification
by: Wang, Qizao, et al.
Published: (2024)