Saved in:
| Main Authors: | Kim, Jiwan, Kim, Kibum, Kim, Wonjoong, Lee, Byung-Kwan, Park, Chanyoung |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.12358 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
by: Kim, Jiwan, et al.
Published: (2025)
by: Kim, Jiwan, et al.
Published: (2025)
Test-Time Training for Visual Foresight Vision-Language-Action Models
by: Park, Sangwu, et al.
Published: (2026)
by: Park, Sangwu, et al.
Published: (2026)
SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials
by: Kim, Wonjoong, et al.
Published: (2024)
by: Kim, Wonjoong, et al.
Published: (2024)
v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning
by: Chung, Jiwan, et al.
Published: (2025)
by: Chung, Jiwan, et al.
Published: (2025)
Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
by: Jeon, Jaehyeong, et al.
Published: (2024)
by: Jeon, Jaehyeong, et al.
Published: (2024)
RA-SGG: Retrieval-Augmented Scene Graph Generation Framework via Multi-Prototype Learning
by: Yoon, Kanghoon, et al.
Published: (2024)
by: Yoon, Kanghoon, et al.
Published: (2024)
Adaptive Self-training Framework for Fine-grained Scene Graph Generation
by: Kim, Kibum, et al.
Published: (2024)
by: Kim, Kibum, et al.
Published: (2024)
Weakly Supervised Video Scene Graph Generation via Natural Language Supervision
by: Kim, Kibum, et al.
Published: (2025)
by: Kim, Kibum, et al.
Published: (2025)
ERASE: Eliminating Redundant Visual Tokens via Adaptive Two-Stage Token Pruning
by: Lee, Yuna, et al.
Published: (2026)
by: Lee, Yuna, et al.
Published: (2026)
Does Visual Token Pruning Improve Calibration? An Empirical Study on Confidence in MLLMs
by: Tan, Kaizhen
Published: (2026)
by: Tan, Kaizhen
Published: (2026)
SSG: Scaled Spatial Guidance for Multi-Scale Visual Autoregressive Generation
by: Shin, Youngwoo, et al.
Published: (2026)
by: Shin, Youngwoo, et al.
Published: (2026)
GridPrune: From "Where to Look" to "What to Select" in Visual Token Pruning for MLLMs
by: Duan, Yuxiang, et al.
Published: (2025)
by: Duan, Yuxiang, et al.
Published: (2025)
LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation
by: Kim, Kibum, et al.
Published: (2023)
by: Kim, Kibum, et al.
Published: (2023)
IDPruner: Harmonizing Importance and Diversity in Visual Token Pruning for MLLMs
by: Tan, Yifan, et al.
Published: (2026)
by: Tan, Yifan, et al.
Published: (2026)
When Token Pruning is Worse than Random: Understanding Visual Token Information in VLLMs
by: Wang, Yahong, et al.
Published: (2025)
by: Wang, Yahong, et al.
Published: (2025)
Relevance-aware Multi-context Contrastive Decoding for Retrieval-augmented Visual Question Answering
by: Kim, Jongha, et al.
Published: (2026)
by: Kim, Jongha, et al.
Published: (2026)
ToDRE: Effective Visual Token Pruning via Token Diversity and Task Relevance
by: Li, Duo, et al.
Published: (2025)
by: Li, Duo, et al.
Published: (2025)
Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding
by: Chung, Jiwan, et al.
Published: (2024)
by: Chung, Jiwan, et al.
Published: (2024)
Superpixel Tokenization for Vision Transformers: Preserving Semantic Integrity in Visual Tokens
by: Lew, Jaihyun, et al.
Published: (2024)
by: Lew, Jaihyun, et al.
Published: (2024)
DocPrune:Efficient Document Question Answering via Background, Question, and Comprehension-aware Token Pruning
by: Choi, Joonmyung, et al.
Published: (2026)
by: Choi, Joonmyung, et al.
Published: (2026)
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
by: Lee, Byung-Kwan, et al.
Published: (2024)
by: Lee, Byung-Kwan, et al.
Published: (2024)
CoLLaVO: Crayon Large Language and Vision mOdel
by: Lee, Byung-Kwan, et al.
Published: (2024)
by: Lee, Byung-Kwan, et al.
Published: (2024)
MoAI: Mixture of All Intelligence for Large Language and Vision Models
by: Lee, Byung-Kwan, et al.
Published: (2024)
by: Lee, Byung-Kwan, et al.
Published: (2024)
EvoPrune: Early-Stage Visual Token Pruning for Efficient MLLMs
by: Chen, Yuhao, et al.
Published: (2026)
by: Chen, Yuhao, et al.
Published: (2026)
Can MLLMs Reason About Visual Persuasion? Evaluating the Efficacy and Faithfulness of Reasoning
by: Lee, Naeun, et al.
Published: (2026)
by: Lee, Naeun, et al.
Published: (2026)
The Mirage of Performance Gains: Why Contrastive Decoding Fails to Mitigate Object Hallucinations in MLLMs?
by: Yin, Hao, et al.
Published: (2025)
by: Yin, Hao, et al.
Published: (2025)
AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models
by: Baek, Changwoo, et al.
Published: (2026)
by: Baek, Changwoo, et al.
Published: (2026)
RedundancyLens: Revealing and Exploiting Visual Token Processing Redundancy for Efficient Decoder-Only MLLMs
by: Li, Hongliang, et al.
Published: (2025)
by: Li, Hongliang, et al.
Published: (2025)
Focus, Don't Prune: Identifying Instruction-Relevant Regions for Information-Rich Image Understanding
by: Kwon, Mincheol, et al.
Published: (2026)
by: Kwon, Mincheol, et al.
Published: (2026)
Revisit What You See: Revealing Visual Semantics in Vision Tokens to Guide LVLM Decoding
by: Cho, Beomsik, et al.
Published: (2025)
by: Cho, Beomsik, et al.
Published: (2025)
Training-free Uncertainty Guidance for Complex Visual Tasks with MLLMs
by: Kim, Sanghwan, et al.
Published: (2025)
by: Kim, Sanghwan, et al.
Published: (2025)
A More Word-like Image Tokenization for MLLMs
by: Lee, Hyun, et al.
Published: (2026)
by: Lee, Hyun, et al.
Published: (2026)
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
by: Chi, Donghwan, et al.
Published: (2025)
by: Chi, Donghwan, et al.
Published: (2025)
Frequency-Aware Token Reduction for Efficient Vision Transformer
by: Lee, Dong-Jae, et al.
Published: (2025)
by: Lee, Dong-Jae, et al.
Published: (2025)
LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
by: Lou, Haoran, et al.
Published: (2025)
by: Lou, Haoran, et al.
Published: (2025)
IWP: Token Pruning as Implicit Weight Pruning in Large Vision Language Models
by: Lee, Dong-Jae, et al.
Published: (2026)
by: Lee, Dong-Jae, et al.
Published: (2026)
Representation Shift: Unifying Token Compression with FlashAttention
by: Choi, Joonmyung, et al.
Published: (2025)
by: Choi, Joonmyung, et al.
Published: (2025)
How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images
by: Liu, Guimeng, et al.
Published: (2026)
by: Liu, Guimeng, et al.
Published: (2026)
Structured State-Space Regularization for Generation-Friendly Image Tokenization
by: Lee, Jinsung, et al.
Published: (2026)
by: Lee, Jinsung, et al.
Published: (2026)
Phantom of Latent for Large Language and Vision Models
by: Lee, Byung-Kwan, et al.
Published: (2024)
by: Lee, Byung-Kwan, et al.
Published: (2024)
Similar Items
-
CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
by: Kim, Jiwan, et al.
Published: (2025) -
Test-Time Training for Visual Foresight Vision-Language-Action Models
by: Park, Sangwu, et al.
Published: (2026) -
SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials
by: Kim, Wonjoong, et al.
Published: (2024) -
v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning
by: Chung, Jiwan, et al.
Published: (2025) -
Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
by: Jeon, Jaehyeong, et al.
Published: (2024)