Saved in:
| Main Authors: | Zhang, Xiaofeng, Zeng, Fanshuo, Quan, Yihao, Hui, Zheng, Yao, Jiawei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.09817 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
by: Wang, Weiyun, et al.
Published: (2024)
by: Wang, Weiyun, et al.
Published: (2024)
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization
by: Zhang, Zefeng, et al.
Published: (2025)
by: Zhang, Zefeng, et al.
Published: (2025)
Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning
by: Hua, Jiacheng, et al.
Published: (2026)
by: Hua, Jiacheng, et al.
Published: (2026)
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers
by: Yuan, Qianhao, et al.
Published: (2025)
by: Yuan, Qianhao, et al.
Published: (2025)
Mechanistic Diagnostics of Spatial Lexical Bias in Multimodal Large Language Model Spatial Reasoning
by: Ma, Chuang, et al.
Published: (2026)
by: Ma, Chuang, et al.
Published: (2026)
Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models
by: Chen, Jiaxing, et al.
Published: (2024)
by: Chen, Jiaxing, et al.
Published: (2024)
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
by: Liu, Jianyu, et al.
Published: (2025)
by: Liu, Jianyu, et al.
Published: (2025)
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
by: Li, Yunxin, et al.
Published: (2025)
by: Li, Yunxin, et al.
Published: (2025)
E$^2$AT: Multimodal Jailbreak Defense via Dynamic Joint Optimization for Multimodal Large Language Models
by: Lu, Liming, et al.
Published: (2025)
by: Lu, Liming, et al.
Published: (2025)
Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models
by: Zhan, Xiaoyu, et al.
Published: (2025)
by: Zhan, Xiaoyu, et al.
Published: (2025)
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
by: Yao, Huanjin, et al.
Published: (2025)
by: Yao, Huanjin, et al.
Published: (2025)
MMRQA: Signal-Enhanced Multimodal Large Language Models for MRI Quality Assessment
by: Jia, Fankai, et al.
Published: (2025)
by: Jia, Fankai, et al.
Published: (2025)
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
by: Zhu, Wenxin, et al.
Published: (2025)
by: Zhu, Wenxin, et al.
Published: (2025)
Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models
by: Xu, Jiacong, et al.
Published: (2025)
by: Xu, Jiacong, et al.
Published: (2025)
From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks
by: Zhang, Xiaofeng, et al.
Published: (2024)
by: Zhang, Xiaofeng, et al.
Published: (2024)
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
by: Xu, Shilin, et al.
Published: (2025)
by: Xu, Shilin, et al.
Published: (2025)
Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding
by: Li, Yun, et al.
Published: (2025)
by: Li, Yun, et al.
Published: (2025)
TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning
by: Liu, Daixian, et al.
Published: (2026)
by: Liu, Daixian, et al.
Published: (2026)
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception
by: Huang, Yipo, et al.
Published: (2024)
by: Huang, Yipo, et al.
Published: (2024)
From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models
by: Shang, Yuying, et al.
Published: (2024)
by: Shang, Yuying, et al.
Published: (2024)
ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction
by: Guo, Zichun, et al.
Published: (2026)
by: Guo, Zichun, et al.
Published: (2026)
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
by: Wang, Yuqing, et al.
Published: (2023)
by: Wang, Yuqing, et al.
Published: (2023)
Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models
by: Zhou, Qiji, et al.
Published: (2024)
by: Zhou, Qiji, et al.
Published: (2024)
Model Composition for Multimodal Large Language Models
by: Chen, Chi, et al.
Published: (2024)
by: Chen, Chi, et al.
Published: (2024)
Multimodal Chain-of-Thought Reasoning in Language Models
by: Zhang, Zhuosheng, et al.
Published: (2023)
by: Zhang, Zhuosheng, et al.
Published: (2023)
MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models
by: Xia, Yinan, et al.
Published: (2025)
by: Xia, Yinan, et al.
Published: (2025)
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
by: Zhang, Jingyi, et al.
Published: (2025)
by: Zhang, Jingyi, et al.
Published: (2025)
From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs
by: Liu, Yulong, et al.
Published: (2024)
by: Liu, Yulong, et al.
Published: (2024)
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
by: You, Haoxuan, et al.
Published: (2023)
by: You, Haoxuan, et al.
Published: (2023)
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
by: Pi, Renjie, et al.
Published: (2024)
by: Pi, Renjie, et al.
Published: (2024)
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models
by: Sun, Haoyuan, et al.
Published: (2025)
by: Sun, Haoyuan, et al.
Published: (2025)
Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization
by: Diao, Xingjian, et al.
Published: (2026)
by: Diao, Xingjian, et al.
Published: (2026)
Can Large Vision-Language Models Understand Multimodal Sarcasm?
by: Wang, Xinyu, et al.
Published: (2025)
by: Wang, Xinyu, et al.
Published: (2025)
HEED: Density-Weighted Residual Alignment for Hybrid Vision-Language Model Distillation
by: Liang, Yihao, et al.
Published: (2026)
by: Liang, Yihao, et al.
Published: (2026)
A Survey on Agentic Multimodal Large Language Models
by: Yao, Huanjin, et al.
Published: (2025)
by: Yao, Huanjin, et al.
Published: (2025)
M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
by: Wang, Hongyu, et al.
Published: (2024)
by: Wang, Hongyu, et al.
Published: (2024)
Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge
by: Wang, Yuxuan, et al.
Published: (2024)
by: Wang, Yuxuan, et al.
Published: (2024)
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
by: Hu, Jinyi, et al.
Published: (2023)
by: Hu, Jinyi, et al.
Published: (2023)
Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends
by: Tami, Mohammad Abu, et al.
Published: (2025)
by: Tami, Mohammad Abu, et al.
Published: (2025)
MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models
by: Sharma, Harshita, et al.
Published: (2024)
by: Sharma, Harshita, et al.
Published: (2024)
Similar Items
-
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
by: Wang, Weiyun, et al.
Published: (2024) -
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization
by: Zhang, Zefeng, et al.
Published: (2025) -
Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning
by: Hua, Jiacheng, et al.
Published: (2026) -
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers
by: Yuan, Qianhao, et al.
Published: (2025) -
Mechanistic Diagnostics of Spatial Lexical Bias in Multimodal Large Language Model Spatial Reasoning
by: Ma, Chuang, et al.
Published: (2026)