:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kim, Jiwan, Kim, Kibum, Kim, Wonjoong, Lee, Byung-Kwan, Park, Chanyoung
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.12358
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
by: Kim, Jiwan, et al.
Published: (2025)

Test-Time Training for Visual Foresight Vision-Language-Action Models
by: Park, Sangwu, et al.
Published: (2026)

SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials
by: Kim, Wonjoong, et al.
Published: (2024)

v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning
by: Chung, Jiwan, et al.
Published: (2025)

Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
by: Jeon, Jaehyeong, et al.
Published: (2024)

RA-SGG: Retrieval-Augmented Scene Graph Generation Framework via Multi-Prototype Learning
by: Yoon, Kanghoon, et al.
Published: (2024)

Adaptive Self-training Framework for Fine-grained Scene Graph Generation
by: Kim, Kibum, et al.
Published: (2024)

Weakly Supervised Video Scene Graph Generation via Natural Language Supervision
by: Kim, Kibum, et al.
Published: (2025)

ERASE: Eliminating Redundant Visual Tokens via Adaptive Two-Stage Token Pruning
by: Lee, Yuna, et al.
Published: (2026)

Does Visual Token Pruning Improve Calibration? An Empirical Study on Confidence in MLLMs
by: Tan, Kaizhen
Published: (2026)

SSG: Scaled Spatial Guidance for Multi-Scale Visual Autoregressive Generation
by: Shin, Youngwoo, et al.
Published: (2026)

GridPrune: From "Where to Look" to "What to Select" in Visual Token Pruning for MLLMs
by: Duan, Yuxiang, et al.
Published: (2025)

LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation
by: Kim, Kibum, et al.
Published: (2023)

IDPruner: Harmonizing Importance and Diversity in Visual Token Pruning for MLLMs
by: Tan, Yifan, et al.
Published: (2026)

When Token Pruning is Worse than Random: Understanding Visual Token Information in VLLMs
by: Wang, Yahong, et al.
Published: (2025)

Relevance-aware Multi-context Contrastive Decoding for Retrieval-augmented Visual Question Answering
by: Kim, Jongha, et al.
Published: (2026)

ToDRE: Effective Visual Token Pruning via Token Diversity and Task Relevance
by: Li, Duo, et al.
Published: (2025)

Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding
by: Chung, Jiwan, et al.
Published: (2024)

Superpixel Tokenization for Vision Transformers: Preserving Semantic Integrity in Visual Tokens
by: Lew, Jaihyun, et al.
Published: (2024)

DocPrune:Efficient Document Question Answering via Background, Question, and Comprehension-aware Token Pruning
by: Choi, Joonmyung, et al.
Published: (2026)

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
by: Lee, Byung-Kwan, et al.
Published: (2024)

CoLLaVO: Crayon Large Language and Vision mOdel
by: Lee, Byung-Kwan, et al.
Published: (2024)

MoAI: Mixture of All Intelligence for Large Language and Vision Models
by: Lee, Byung-Kwan, et al.
Published: (2024)

EvoPrune: Early-Stage Visual Token Pruning for Efficient MLLMs
by: Chen, Yuhao, et al.
Published: (2026)

Can MLLMs Reason About Visual Persuasion? Evaluating the Efficacy and Faithfulness of Reasoning
by: Lee, Naeun, et al.
Published: (2026)

The Mirage of Performance Gains: Why Contrastive Decoding Fails to Mitigate Object Hallucinations in MLLMs?
by: Yin, Hao, et al.
Published: (2025)

AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models
by: Baek, Changwoo, et al.
Published: (2026)

RedundancyLens: Revealing and Exploiting Visual Token Processing Redundancy for Efficient Decoder-Only MLLMs
by: Li, Hongliang, et al.
Published: (2025)

Focus, Don't Prune: Identifying Instruction-Relevant Regions for Information-Rich Image Understanding
by: Kwon, Mincheol, et al.
Published: (2026)

Revisit What You See: Revealing Visual Semantics in Vision Tokens to Guide LVLM Decoding
by: Cho, Beomsik, et al.
Published: (2025)

Training-free Uncertainty Guidance for Complex Visual Tasks with MLLMs
by: Kim, Sanghwan, et al.
Published: (2025)

A More Word-like Image Tokenization for MLLMs
by: Lee, Hyun, et al.
Published: (2026)

Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
by: Chi, Donghwan, et al.
Published: (2025)

Frequency-Aware Token Reduction for Efficient Vision Transformer
by: Lee, Dong-Jae, et al.
Published: (2025)

LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
by: Lou, Haoran, et al.
Published: (2025)

IWP: Token Pruning as Implicit Weight Pruning in Large Vision Language Models
by: Lee, Dong-Jae, et al.
Published: (2026)

Representation Shift: Unifying Token Compression with FlashAttention
by: Choi, Joonmyung, et al.
Published: (2025)

How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images
by: Liu, Guimeng, et al.
Published: (2026)

Structured State-Space Regularization for Generation-Friendly Image Tokenization
by: Lee, Jinsung, et al.
Published: (2026)

Phantom of Latent for Large Language and Vision Models
by: Lee, Byung-Kwan, et al.
Published: (2024)