Saved in:
| Main Authors: | Huang, Zhaohong, Liu, Wenjing, Zhang, Yuxin, Chao, Fei, Ji, Rongrong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.05601 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Prototype-Based Test-Time Adaptation of Vision-Language Models
by: Huang, Zhaohong, et al.
Published: (2026)
by: Huang, Zhaohong, et al.
Published: (2026)
GS-Bias: Global-Spatial Bias Learner for Single-Image Test-Time Adaptation of Vision-Language Models
by: Huang, Zhaohong, et al.
Published: (2025)
by: Huang, Zhaohong, et al.
Published: (2025)
VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient MLLMs Inference
by: Jiang, Pengfei, et al.
Published: (2025)
by: Jiang, Pengfei, et al.
Published: (2025)
DS$^2$Net: Detail-Semantic Deep Supervision Network for Medical Image Segmentation
by: Huang, Zhaohong, et al.
Published: (2025)
by: Huang, Zhaohong, et al.
Published: (2025)
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
by: Ma, Yiwei, et al.
Published: (2025)
by: Ma, Yiwei, et al.
Published: (2025)
TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning
by: Xie, Jingjing, et al.
Published: (2024)
by: Xie, Jingjing, et al.
Published: (2024)
IDPruner: Harmonizing Importance and Diversity in Visual Token Pruning for MLLMs
by: Tan, Yifan, et al.
Published: (2026)
by: Tan, Yifan, et al.
Published: (2026)
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
by: Lin, Zhihang, et al.
Published: (2024)
by: Lin, Zhihang, et al.
Published: (2024)
Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation
by: Hwang, Yerin, et al.
Published: (2025)
by: Hwang, Yerin, et al.
Published: (2025)
CoIDO: Efficient Data Selection for Visual Instruction Tuning via Coupled Importance-Diversity Optimization
by: Yan, Yichen, et al.
Published: (2025)
by: Yan, Yichen, et al.
Published: (2025)
Learning Image Demoireing from Unpaired Real Data
by: Zhong, Yunshan, et al.
Published: (2024)
by: Zhong, Yunshan, et al.
Published: (2024)
Event-Anchored Frame Selection for Effective Long-Video Understanding
by: Chen, Wang, et al.
Published: (2026)
by: Chen, Wang, et al.
Published: (2026)
KTV: Keyframes and Key Tokens Selection for Efficient Training-Free Video LLMs
by: Song, Baiyang, et al.
Published: (2026)
by: Song, Baiyang, et al.
Published: (2026)
Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference
by: Zhan, Wengyi, et al.
Published: (2025)
by: Zhan, Wengyi, et al.
Published: (2025)
AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection
by: Zhang, Shuheng, et al.
Published: (2025)
by: Zhang, Shuheng, et al.
Published: (2025)
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding
by: Zhang, Yunzhu, et al.
Published: (2025)
by: Zhang, Yunzhu, et al.
Published: (2025)
Vision Remember: Recovering Visual Information in Efficient LVLM with Vision Feature Resampling
by: Feng, Ze, et al.
Published: (2025)
by: Feng, Ze, et al.
Published: (2025)
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
by: Zhang, Liang, et al.
Published: (2024)
by: Zhang, Liang, et al.
Published: (2024)
POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation
by: Zhu, Lanyun, et al.
Published: (2025)
by: Zhu, Lanyun, et al.
Published: (2025)
ASAP: Attention-Shift-Aware Pruning for Efficient LVLM Inference
by: Pathak, Surendra, et al.
Published: (2026)
by: Pathak, Surendra, et al.
Published: (2026)
Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
by: Zhang, Pingping, et al.
Published: (2024)
by: Zhang, Pingping, et al.
Published: (2024)
QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model
by: Xie, Fei, et al.
Published: (2024)
by: Xie, Fei, et al.
Published: (2024)
MBQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-width Network Quantization
by: Zhong, Yunshan, et al.
Published: (2023)
by: Zhong, Yunshan, et al.
Published: (2023)
Importance-Based Token Merging for Efficient Image and Video Generation
by: Wu, Haoyu, et al.
Published: (2024)
by: Wu, Haoyu, et al.
Published: (2024)
Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction
by: Zhong, Yunshan, et al.
Published: (2024)
by: Zhong, Yunshan, et al.
Published: (2024)
Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers
by: Zhong, Yunshan, et al.
Published: (2024)
by: Zhong, Yunshan, et al.
Published: (2024)
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
by: Zhang, Yuan, et al.
Published: (2024)
by: Zhang, Yuan, et al.
Published: (2024)
HAWK: Head Importance-Aware Visual Token Pruning in Multimodal Models
by: Zhu, Qihui, et al.
Published: (2026)
by: Zhu, Qihui, et al.
Published: (2026)
D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning
by: Zhang, Evelyn, et al.
Published: (2025)
by: Zhang, Evelyn, et al.
Published: (2025)
Selective Visual Prompting in Vision Mamba
by: Yao, Yifeng, et al.
Published: (2024)
by: Yao, Yifeng, et al.
Published: (2024)
VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
by: Zhuang, Xianwei, et al.
Published: (2025)
by: Zhuang, Xianwei, et al.
Published: (2025)
Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck
by: Bulat, Adrian, et al.
Published: (2025)
by: Bulat, Adrian, et al.
Published: (2025)
Evidence Packing for Cross-Domain Image Deepfake Detection with LVLMs
by: Liu, Yuxin, et al.
Published: (2026)
by: Liu, Yuxin, et al.
Published: (2026)
Visual Implicit Autoregressive Modeling
by: Jiang, Pengfei, et al.
Published: (2026)
by: Jiang, Pengfei, et al.
Published: (2026)
Revisit What You See: Revealing Visual Semantics in Vision Tokens to Guide LVLM Decoding
by: Cho, Beomsik, et al.
Published: (2025)
by: Cho, Beomsik, et al.
Published: (2025)
Select2Col: Leveraging Spatial-Temporal Importance of Semantic Information for Efficient Collaborative Perception
by: Liu, Yuntao, et al.
Published: (2023)
by: Liu, Yuntao, et al.
Published: (2023)
TR-PTS: Task-Relevant Parameter and Token Selection for Efficient Tuning
by: Luo, Siqi, et al.
Published: (2025)
by: Luo, Siqi, et al.
Published: (2025)
Compressor-VLA: Instruction-Guided Visual Token Compression for Efficient Robotic Manipulation
by: Gao, Juntao, et al.
Published: (2025)
by: Gao, Juntao, et al.
Published: (2025)
FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
by: Cai, Kaitong, et al.
Published: (2025)
by: Cai, Kaitong, et al.
Published: (2025)
ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion
by: Zhang, Ziyue, et al.
Published: (2024)
by: Zhang, Ziyue, et al.
Published: (2024)
Similar Items
-
Prototype-Based Test-Time Adaptation of Vision-Language Models
by: Huang, Zhaohong, et al.
Published: (2026) -
GS-Bias: Global-Spatial Bias Learner for Single-Image Test-Time Adaptation of Vision-Language Models
by: Huang, Zhaohong, et al.
Published: (2025) -
VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient MLLMs Inference
by: Jiang, Pengfei, et al.
Published: (2025) -
DS$^2$Net: Detail-Semantic Deep Supervision Network for Medical Image Segmentation
by: Huang, Zhaohong, et al.
Published: (2025) -
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
by: Ma, Yiwei, et al.
Published: (2025)