Saved in:
| Main Authors: | Jia, Fankai, Gan, Daisong, Zhang, Zhe, Wen, Zhaochi, Dan, Chenchen, Liang, Dong, Wang, Haifeng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.24888 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?
by: Li, Xiujun, et al.
Published: (2023)
by: Li, Xiujun, et al.
Published: (2023)
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
by: Liu, Jianyu, et al.
Published: (2025)
by: Liu, Jianyu, et al.
Published: (2025)
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
by: Wang, Weiyun, et al.
Published: (2024)
by: Wang, Weiyun, et al.
Published: (2024)
Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models
by: Gan, Woody Haosheng, et al.
Published: (2025)
by: Gan, Woody Haosheng, et al.
Published: (2025)
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
by: Qian, Yusu, et al.
Published: (2024)
by: Qian, Yusu, et al.
Published: (2024)
Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding
by: Li, Yun, et al.
Published: (2025)
by: Li, Yun, et al.
Published: (2025)
Dynamic Self-adaptive Multiscale Distillation from Pre-trained Multimodal Large Model for Efficient Cross-modal Representation Learning
by: Liang, Zhengyang, et al.
Published: (2024)
by: Liang, Zhengyang, et al.
Published: (2024)
MME-SCI: A Comprehensive and Challenging Science Benchmark for Multimodal Large Language Models
by: Ruan, Jiacheng, et al.
Published: (2025)
by: Ruan, Jiacheng, et al.
Published: (2025)
A Survey on Benchmarks of Multimodal Large Language Models
by: Li, Jian, et al.
Published: (2024)
by: Li, Jian, et al.
Published: (2024)
Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
by: Wen, Zichen, et al.
Published: (2025)
by: Wen, Zichen, et al.
Published: (2025)
Enhancing Multimodal Large Language Models Complex Reason via Similarity Computation
by: Zhang, Xiaofeng, et al.
Published: (2024)
by: Zhang, Xiaofeng, et al.
Published: (2024)
HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
by: HyperAI Team, et al.
Published: (2025)
by: HyperAI Team, et al.
Published: (2025)
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
by: Pan, Xichen, et al.
Published: (2023)
by: Pan, Xichen, et al.
Published: (2023)
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
by: Huang, Wenxuan, et al.
Published: (2025)
by: Huang, Wenxuan, et al.
Published: (2025)
UniChange: Unifying Change Detection with Multimodal Large Language Model
by: Zhang, Xu, et al.
Published: (2025)
by: Zhang, Xu, et al.
Published: (2025)
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model
by: Jiang, Chaoya, et al.
Published: (2024)
by: Jiang, Chaoya, et al.
Published: (2024)
Scalable Vision Language Model Training via High Quality Data Curation
by: Dong, Hongyuan, et al.
Published: (2025)
by: Dong, Hongyuan, et al.
Published: (2025)
Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge
by: Wang, Yuxuan, et al.
Published: (2024)
by: Wang, Yuxuan, et al.
Published: (2024)
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
by: Wu, Yuhang, et al.
Published: (2024)
by: Wu, Yuhang, et al.
Published: (2024)
Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models
by: Zhan, Xiaoyu, et al.
Published: (2025)
by: Zhan, Xiaoyu, et al.
Published: (2025)
BLINK: Multimodal Large Language Models Can See but Not Perceive
by: Fu, Xingyu, et al.
Published: (2024)
by: Fu, Xingyu, et al.
Published: (2024)
Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoning
by: Xiao, Wenyi, et al.
Published: (2025)
by: Xiao, Wenyi, et al.
Published: (2025)
Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends
by: Tami, Mohammad Abu, et al.
Published: (2025)
by: Tami, Mohammad Abu, et al.
Published: (2025)
MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models
by: Sharma, Harshita, et al.
Published: (2024)
by: Sharma, Harshita, et al.
Published: (2024)
Multimodal LLM With Hierarchical Mixture-of-Experts for VQA on 3D Brain MRI
by: Vepa, Arvind Murari, et al.
Published: (2025)
by: Vepa, Arvind Murari, et al.
Published: (2025)
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
by: Li, Yifan, et al.
Published: (2024)
by: Li, Yifan, et al.
Published: (2024)
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
by: Qian, Yusu, et al.
Published: (2024)
by: Qian, Yusu, et al.
Published: (2024)
Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
by: Mitra, Chancharik, et al.
Published: (2024)
by: Mitra, Chancharik, et al.
Published: (2024)
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters
by: Wang, Weizhi, et al.
Published: (2024)
by: Wang, Weizhi, et al.
Published: (2024)
Do Multimodal Large Language Models Understand Welding?
by: Khvatskii, Grigorii, et al.
Published: (2025)
by: Khvatskii, Grigorii, et al.
Published: (2025)
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers
by: Yuan, Qianhao, et al.
Published: (2025)
by: Yuan, Qianhao, et al.
Published: (2025)
MathScape: Benchmarking Multimodal Large Language Models in Real-World Mathematical Contexts
by: Liang, Hao, et al.
Published: (2024)
by: Liang, Hao, et al.
Published: (2024)
WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image
by: Liang, Yuci, et al.
Published: (2024)
by: Liang, Yuci, et al.
Published: (2024)
Global Position Aware Group Choreography using Large Language Model
by: Pang, Haozhou, et al.
Published: (2025)
by: Pang, Haozhou, et al.
Published: (2025)
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models
by: Zhang, Yichi, et al.
Published: (2024)
by: Zhang, Yichi, et al.
Published: (2024)
Model Composition for Multimodal Large Language Models
by: Chen, Chi, et al.
Published: (2024)
by: Chen, Chi, et al.
Published: (2024)
Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models
by: Yuxuan, Cao, et al.
Published: (2025)
by: Yuxuan, Cao, et al.
Published: (2025)
E$^2$AT: Multimodal Jailbreak Defense via Dynamic Joint Optimization for Multimodal Large Language Models
by: Lu, Liming, et al.
Published: (2025)
by: Lu, Liming, et al.
Published: (2025)
Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models
by: Zhong, Weihong, et al.
Published: (2024)
by: Zhong, Weihong, et al.
Published: (2024)
VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning
by: Xiao, Wenyi, et al.
Published: (2026)
by: Xiao, Wenyi, et al.
Published: (2026)
Similar Items
-
Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?
by: Li, Xiujun, et al.
Published: (2023) -
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
by: Liu, Jianyu, et al.
Published: (2025) -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
by: Wang, Weiyun, et al.
Published: (2024) -
Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models
by: Gan, Woody Haosheng, et al.
Published: (2025) -
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
by: Qian, Yusu, et al.
Published: (2024)