Saved in:
| Main Authors: | Dang, Yunkai, Jiang, Yifan, Jiang, Yizhu, Chen, Anqi, Li, Wenbin, Gao, Yang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.17274 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CLASP: Class-Adaptive Layer Fusion and Dual-Stage Pruning for Multimodal Large Language Models
by: Dang, Yunkai, et al.
Published: (2026)
by: Dang, Yunkai, et al.
Published: (2026)
UHR-BAT: Budget-Aware Token Compression Vision-Language model for Ultra-High-Resolution Remote Sensing
by: Dang, Yunkai, et al.
Published: (2026)
by: Dang, Yunkai, et al.
Published: (2026)
FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing
by: Dang, Yunkai, et al.
Published: (2025)
by: Dang, Yunkai, et al.
Published: (2025)
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
by: Qu, Liao, et al.
Published: (2024)
by: Qu, Liao, et al.
Published: (2024)
Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation
by: Zhao, Yunpu, et al.
Published: (2025)
by: Zhao, Yunpu, et al.
Published: (2025)
Annotation-Free Visual Reasoning for High-Resolution Large Multimodal Models via Reinforcement Learning
by: Yang, Jiacheng, et al.
Published: (2026)
by: Yang, Jiacheng, et al.
Published: (2026)
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
by: Jiao, Yang, et al.
Published: (2025)
by: Jiao, Yang, et al.
Published: (2025)
Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding
by: Jiang, Yibo, et al.
Published: (2026)
by: Jiang, Yibo, et al.
Published: (2026)
UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model
by: Zhuang, Shaobin, et al.
Published: (2026)
by: Zhuang, Shaobin, et al.
Published: (2026)
A Benchmark for Ultra-High-Resolution Remote Sensing MLLMs
by: Dang, Yunkai, et al.
Published: (2025)
by: Dang, Yunkai, et al.
Published: (2025)
UniTok: A Unified Tokenizer for Visual Generation and Understanding
by: Ma, Chuofan, et al.
Published: (2025)
by: Ma, Chuofan, et al.
Published: (2025)
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
by: Ma, Chuofan, et al.
Published: (2024)
by: Ma, Chuofan, et al.
Published: (2024)
SafeVid: Toward Safety Aligned Video Large Multimodal Models
by: Wang, Yixu, et al.
Published: (2025)
by: Wang, Yixu, et al.
Published: (2025)
AviationLMM: A Large Multimodal Foundation Model for Civil Aviation
by: Li, Wenbin, et al.
Published: (2026)
by: Li, Wenbin, et al.
Published: (2026)
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
by: Liu, Zeyu, et al.
Published: (2026)
by: Liu, Zeyu, et al.
Published: (2026)
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
by: Zhang, Yue, et al.
Published: (2024)
by: Zhang, Yue, et al.
Published: (2024)
Mode-as-Sequence: Translating Multimodal Motion Prediction into Unified Sequential Mode Modeling
by: Zhou, Zikang, et al.
Published: (2026)
by: Zhou, Zikang, et al.
Published: (2026)
Adversarial Prompt Injection Attack on Multimodal Large Language Models
by: Ding, Meiwen, et al.
Published: (2026)
by: Ding, Meiwen, et al.
Published: (2026)
Learning to Decode Against Compositional Hallucination in Video Multimodal Large Language Models
by: Xing, Wenbin, et al.
Published: (2026)
by: Xing, Wenbin, et al.
Published: (2026)
Lance: Unified Multimodal Modeling by Multi-Task Synergy
by: Fu, Fengyi, et al.
Published: (2026)
by: Fu, Fengyi, et al.
Published: (2026)
AToken: A Unified Tokenizer for Vision
by: Lu, Jiasen, et al.
Published: (2025)
by: Lu, Jiasen, et al.
Published: (2025)
LFTR: Learning-Free Token Reduction for Multimodal Large Language Models
by: Zhao, Zihui, et al.
Published: (2025)
by: Zhao, Zihui, et al.
Published: (2025)
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
by: Zhang, Huichao, et al.
Published: (2026)
by: Zhang, Huichao, et al.
Published: (2026)
EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models
by: Fang, Yiyang, et al.
Published: (2026)
by: Fang, Yiyang, et al.
Published: (2026)
NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models
by: Wu, Kai, et al.
Published: (2024)
by: Wu, Kai, et al.
Published: (2024)
A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning
by: Jiang, Siyang, et al.
Published: (2025)
by: Jiang, Siyang, et al.
Published: (2025)
QAPruner: Quantization-Aware Vision Token Pruning for Multimodal Large Language Models
by: Wang, Xinhao, et al.
Published: (2026)
by: Wang, Xinhao, et al.
Published: (2026)
PAR: Prompt-Aware Token Reduction Method for Efficient Large Multimodal Models
by: Liu, Yingen, et al.
Published: (2024)
by: Liu, Yingen, et al.
Published: (2024)
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
by: Lin, Zhihang, et al.
Published: (2024)
by: Lin, Zhihang, et al.
Published: (2024)
HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models
by: Zhou, Ziqin, et al.
Published: (2025)
by: Zhou, Ziqin, et al.
Published: (2025)
What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph
by: Jiang, Yutao, et al.
Published: (2025)
by: Jiang, Yutao, et al.
Published: (2025)
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?
by: Wen, Zimo, et al.
Published: (2026)
by: Wen, Zimo, et al.
Published: (2026)
Semantic Generative Tuning for Unified Multimodal Models
by: Yu, Songsong, et al.
Published: (2026)
by: Yu, Songsong, et al.
Published: (2026)
SDIGLM: Leveraging Large Language Models and Multi-Modal Chain of Thought for Structural Damage Identification
by: Zhang, Yunkai, et al.
Published: (2025)
by: Zhang, Yunkai, et al.
Published: (2025)
Forecasting When to Forecast: Accelerating Diffusion Models with Confidence-Gated Taylor
by: Guan, Xiaoliu, et al.
Published: (2025)
by: Guan, Xiaoliu, et al.
Published: (2025)
Fair-Eye Net: A Fair, Trustworthy, Multimodal Integrated Glaucoma Full Chain AI System
by: Wei, Wenbin, et al.
Published: (2026)
by: Wei, Wenbin, et al.
Published: (2026)
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
by: Wu, Penghao, et al.
Published: (2025)
by: Wu, Penghao, et al.
Published: (2025)
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
by: Chen, Zisheng, et al.
Published: (2025)
by: Chen, Zisheng, et al.
Published: (2025)
MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
by: Jiang, Xi, et al.
Published: (2024)
by: Jiang, Xi, et al.
Published: (2024)
Toward Cognitive Supersensing in Multimodal Large Language Model
by: Li, Boyi, et al.
Published: (2026)
by: Li, Boyi, et al.
Published: (2026)
Similar Items
-
CLASP: Class-Adaptive Layer Fusion and Dual-Stage Pruning for Multimodal Large Language Models
by: Dang, Yunkai, et al.
Published: (2026) -
UHR-BAT: Budget-Aware Token Compression Vision-Language model for Ultra-High-Resolution Remote Sensing
by: Dang, Yunkai, et al.
Published: (2026) -
FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing
by: Dang, Yunkai, et al.
Published: (2025) -
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
by: Qu, Liao, et al.
Published: (2024) -
Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation
by: Zhao, Yunpu, et al.
Published: (2025)