Saved in:
| Main Authors: | Sharma, Sourabh, Gupta, Sonam, Sadbhawna |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.02456 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
by: Liu, Chengzhi, et al.
Published: (2025)
by: Liu, Chengzhi, et al.
Published: (2025)
Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts
by: Xu, Haolei, et al.
Published: (2026)
by: Xu, Haolei, et al.
Published: (2026)
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
by: Li, Yunxin, et al.
Published: (2025)
by: Li, Yunxin, et al.
Published: (2025)
From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models
by: Wu, Juncheng, et al.
Published: (2026)
by: Wu, Juncheng, et al.
Published: (2026)
ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding
by: Yang, Jianjiang, et al.
Published: (2025)
by: Yang, Jianjiang, et al.
Published: (2025)
Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation
by: Yu, Seonghoon, et al.
Published: (2026)
by: Yu, Seonghoon, et al.
Published: (2026)
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
by: Tong, Jingqi, et al.
Published: (2025)
by: Tong, Jingqi, et al.
Published: (2025)
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
by: Zhu, Wenxin, et al.
Published: (2025)
by: Zhu, Wenxin, et al.
Published: (2025)
Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge
by: Liang, Hao, et al.
Published: (2025)
by: Liang, Hao, et al.
Published: (2025)
Seeing Beyond Words: Self-Supervised Visual Learning for Multimodal Large Language Models
by: Caffagni, Davide, et al.
Published: (2025)
by: Caffagni, Davide, et al.
Published: (2025)
Knowledge-Aware Reasoning over Multimodal Semi-structured Tables
by: Mathur, Suyash Vardhan, et al.
Published: (2024)
by: Mathur, Suyash Vardhan, et al.
Published: (2024)
See, Explain, and Intervene: A Few-Shot Multimodal Agent Framework for Hateful Meme Moderation
by: Rizwan, Naquee, et al.
Published: (2026)
by: Rizwan, Naquee, et al.
Published: (2026)
Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation
by: Yuan, Qianhao, et al.
Published: (2026)
by: Yuan, Qianhao, et al.
Published: (2026)
Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter Era
by: Oneata, Dan, et al.
Published: (2025)
by: Oneata, Dan, et al.
Published: (2025)
ViewFusion: Structured Spatial Thinking Chains for Multi-View Reasoning
by: Tao, Xingjian, et al.
Published: (2026)
by: Tao, Xingjian, et al.
Published: (2026)
Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture
by: Zhang, Longxiang, et al.
Published: (2026)
by: Zhang, Longxiang, et al.
Published: (2026)
v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning
by: Chung, Jiwan, et al.
Published: (2025)
by: Chung, Jiwan, et al.
Published: (2025)
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
by: Yang, Jihan, et al.
Published: (2024)
by: Yang, Jihan, et al.
Published: (2024)
Draw2Think: Harnessing Geometry Reasoning through Constraint Engine Interaction
by: Hu, Juncheng, et al.
Published: (2026)
by: Hu, Juncheng, et al.
Published: (2026)
Seeing Culture: A Benchmark for Visual Reasoning and Grounding
by: Satar, Burak, et al.
Published: (2025)
by: Satar, Burak, et al.
Published: (2025)
BLINK: Multimodal Large Language Models Can See but Not Perceive
by: Fu, Xingyu, et al.
Published: (2024)
by: Fu, Xingyu, et al.
Published: (2024)
Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization
by: Lai, Zhengzhao, et al.
Published: (2025)
by: Lai, Zhengzhao, et al.
Published: (2025)
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs
by: Sun, Kaiser, et al.
Published: (2026)
by: Sun, Kaiser, et al.
Published: (2026)
Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation
by: Martin, Alexander, et al.
Published: (2025)
by: Martin, Alexander, et al.
Published: (2025)
LaRe: Latent Refocusing for Multimodal Reasoning
by: Ma, Jizheng, et al.
Published: (2025)
by: Ma, Jizheng, et al.
Published: (2025)
Reinforcing Multimodal Reasoning Against Visual Degradation
by: Liu, Rui, et al.
Published: (2026)
by: Liu, Rui, et al.
Published: (2026)
Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs
by: Wang, Wenxuan, et al.
Published: (2025)
by: Wang, Wenxuan, et al.
Published: (2025)
Seeing Through Deception: Uncovering Misleading Creator Intent in Multimodal News with Vision-Language Models
by: Wu, Jiaying, et al.
Published: (2025)
by: Wu, Jiaying, et al.
Published: (2025)
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think
by: Chen, Liang, et al.
Published: (2025)
by: Chen, Liang, et al.
Published: (2025)
Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models
by: Wang, Lu, et al.
Published: (2026)
by: Wang, Lu, et al.
Published: (2026)
From Sight to Insight: Improving Visual Reasoning Capabilities of Multimodal Models via Reinforcement Learning
by: Sharif, Omar, et al.
Published: (2026)
by: Sharif, Omar, et al.
Published: (2026)
One RL to See Them All: Visual Triple Unified Reinforcement Learning
by: Ma, Yan, et al.
Published: (2025)
by: Ma, Yan, et al.
Published: (2025)
Diving into Self-Evolving Training for Multimodal Reasoning
by: Liu, Wei, et al.
Published: (2024)
by: Liu, Wei, et al.
Published: (2024)
Unleashing Perception-Time Scaling to Multimodal Reasoning Models
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
Probabilistic Concept Graph Reasoning for Multimodal Misinformation Detection
by: Yang, Ruichao, et al.
Published: (2026)
by: Yang, Ruichao, et al.
Published: (2026)
Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning
by: Hua, Jiacheng, et al.
Published: (2026)
by: Hua, Jiacheng, et al.
Published: (2026)
MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning
by: Gan, Ziliang, et al.
Published: (2024)
by: Gan, Ziliang, et al.
Published: (2024)
Learn to Think: Improving Multimodal Reasoning through Vision-Aware Self-Improvement Training
by: Zhong, Qihuang, et al.
Published: (2026)
by: Zhong, Qihuang, et al.
Published: (2026)
C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning
by: Chen, Xiuwei, et al.
Published: (2025)
by: Chen, Xiuwei, et al.
Published: (2025)
Thinking with Programming Vision: Towards a Unified View for Thinking with Images
by: Guo, Zirun, et al.
Published: (2025)
by: Guo, Zirun, et al.
Published: (2025)
Similar Items
-
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
by: Liu, Chengzhi, et al.
Published: (2025) -
Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts
by: Xu, Haolei, et al.
Published: (2026) -
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
by: Li, Yunxin, et al.
Published: (2025) -
From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models
by: Wu, Juncheng, et al.
Published: (2026) -
ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding
by: Yang, Jianjiang, et al.
Published: (2025)