Saved in:
| Main Authors: | Hu, Lulu, Xiao, Wenhu, Chen, Xin, Xu, Xinhua, Xu, Bowen, Li, Kun, Tao, Yongliang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.04800 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
by: Zhou, Guanyu, et al.
Published: (2024)
by: Zhou, Guanyu, et al.
Published: (2024)
Multi-Paradigm Collaborative Adversarial Attack Against Multi-Modal Large Language Models
by: Li, Yuanbo, et al.
Published: (2026)
by: Li, Yuanbo, et al.
Published: (2026)
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
by: Pan, Xichen, et al.
Published: (2023)
by: Pan, Xichen, et al.
Published: (2023)
MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization
by: Yu, JiangYong, et al.
Published: (2025)
by: Yu, JiangYong, et al.
Published: (2025)
MBQ: Modality-Balanced Quantization for Large Vision-Language Models
by: Li, Shiyao, et al.
Published: (2024)
by: Li, Shiyao, et al.
Published: (2024)
QAPruner: Quantization-Aware Vision Token Pruning for Multimodal Large Language Models
by: Wang, Xinhao, et al.
Published: (2026)
by: Wang, Xinhao, et al.
Published: (2026)
Semantic Alignment for Multimodal Large Language Models
by: Wu, Tao, et al.
Published: (2024)
by: Wu, Tao, et al.
Published: (2024)
Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients
by: Xiang, Ziwei, et al.
Published: (2026)
by: Xiang, Ziwei, et al.
Published: (2026)
Modality-Aware and Anatomical Vector-Quantized Autoencoding for Multimodal Brain MRI
by: Li, Mingjie, et al.
Published: (2026)
by: Li, Mingjie, et al.
Published: (2026)
Are Multimodal Large Language Models Good Annotators for Image Tagging?
by: Xie, Ming-Kun, et al.
Published: (2026)
by: Xie, Ming-Kun, et al.
Published: (2026)
OmniSelect: Dynamic Modality-Aware Token Compression for Efficient Omni-modal Large Language Models
by: Yang, Morunliu, et al.
Published: (2026)
by: Yang, Morunliu, et al.
Published: (2026)
KVSmooth: Mitigating Hallucination in Multi-modal Large Language Models through Key-Value Smoothing
by: Jiang, Siyu, et al.
Published: (2026)
by: Jiang, Siyu, et al.
Published: (2026)
Scaling the Long Video Understanding of Multimodal Large Language Models via Visual Memory Mechanism
by: Chen, Tao, et al.
Published: (2026)
by: Chen, Tao, et al.
Published: (2026)
MMaDA: Multimodal Large Diffusion Language Models
by: Yang, Ling, et al.
Published: (2025)
by: Yang, Ling, et al.
Published: (2025)
LoC-Path: Learning to Compress for Pathology Multimodal Large Language Models
by: Hu, Qingqiao, et al.
Published: (2025)
by: Hu, Qingqiao, et al.
Published: (2025)
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
by: Leng, Sicong, et al.
Published: (2024)
by: Leng, Sicong, et al.
Published: (2024)
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
by: Yu, Xiaomin, et al.
Published: (2026)
by: Yu, Xiaomin, et al.
Published: (2026)
LISA: A Layer-wise Integration and Suppression Approach for Hallucination Mitigation in Multimodal Large Language Models
by: Guo, Zhihui, et al.
Published: (2025)
by: Guo, Zhihui, et al.
Published: (2025)
LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models
by: Yao, Ruilin, et al.
Published: (2025)
by: Yao, Ruilin, et al.
Published: (2025)
FastSmoothSAM: A Fast Smooth Method For Segment Anything Model
by: Xu, Jiasheng, et al.
Published: (2025)
by: Xu, Jiasheng, et al.
Published: (2025)
Uncertainty-Aware Knowledge Distillation for Multimodal Large Language Models
by: Sun, Jingchen, et al.
Published: (2026)
by: Sun, Jingchen, et al.
Published: (2026)
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
by: Xu, Jinjin, et al.
Published: (2023)
by: Xu, Jinjin, et al.
Published: (2023)
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models
by: Cai, Rui, et al.
Published: (2025)
by: Cai, Rui, et al.
Published: (2025)
Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation
by: Xie, Jingjing, et al.
Published: (2024)
by: Xie, Jingjing, et al.
Published: (2024)
GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models
by: Liao, Haicheng, et al.
Published: (2023)
by: Liao, Haicheng, et al.
Published: (2023)
Distraction is All You Need for Multimodal Large Language Model Jailbreaking
by: Yang, Zuopeng, et al.
Published: (2025)
by: Yang, Zuopeng, et al.
Published: (2025)
Modeling Variants of Prompts for Vision-Language Models
by: Li, Ao, et al.
Published: (2025)
by: Li, Ao, et al.
Published: (2025)
Q-VLM: Post-training Quantization for Large Vision-Language Models
by: Wang, Changyuan, et al.
Published: (2024)
by: Wang, Changyuan, et al.
Published: (2024)
Urban Safety Perception Assessments via Integrating Multimodal Large Language Models with Street View Images
by: Zhang, Jiaxin, et al.
Published: (2024)
by: Zhang, Jiaxin, et al.
Published: (2024)
VEQ: Modality-Adaptive Quantization for MoE Vision-Language Models
by: Qin, Guangshuo, et al.
Published: (2026)
by: Qin, Guangshuo, et al.
Published: (2026)
ABC: Achieving Better Control of Multimodal Embeddings using VLMs
by: Schneider, Benjamin, et al.
Published: (2025)
by: Schneider, Benjamin, et al.
Published: (2025)
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
by: Du, Sinan, et al.
Published: (2025)
by: Du, Sinan, et al.
Published: (2025)
Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models
by: Zhong, Yi, et al.
Published: (2026)
by: Zhong, Yi, et al.
Published: (2026)
Towards Effective Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval
by: Chen, Tao, et al.
Published: (2025)
by: Chen, Tao, et al.
Published: (2025)
MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment
by: Xu, Huangbiao, et al.
Published: (2025)
by: Xu, Huangbiao, et al.
Published: (2025)
Vision Inference Former: Sustaining Visual Consistency in Multimodal Large Language Models
by: Dong, Xinpeng, et al.
Published: (2026)
by: Dong, Xinpeng, et al.
Published: (2026)
Exploring Diverse In-Context Configurations for Image Captioning
by: Yang, Xu, et al.
Published: (2023)
by: Yang, Xu, et al.
Published: (2023)
VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models
by: Xu, Mingjie, et al.
Published: (2025)
by: Xu, Mingjie, et al.
Published: (2025)
Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models
by: Shi, Yuheng, et al.
Published: (2026)
by: Shi, Yuheng, et al.
Published: (2026)
Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?
by: Dongfang, Zihao, et al.
Published: (2025)
by: Dongfang, Zihao, et al.
Published: (2025)
Similar Items
-
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
by: Zhou, Guanyu, et al.
Published: (2024) -
Multi-Paradigm Collaborative Adversarial Attack Against Multi-Modal Large Language Models
by: Li, Yuanbo, et al.
Published: (2026) -
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
by: Pan, Xichen, et al.
Published: (2023) -
MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization
by: Yu, JiangYong, et al.
Published: (2025) -
MBQ: Modality-Balanced Quantization for Large Vision-Language Models
by: Li, Shiyao, et al.
Published: (2024)