Saved in:
| Main Authors: | Wu, Zhiheng, Wang, Tong, Wang, Shuning, Liu, Naiming, Zhang, Yumeng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.24339 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation
by: Yu, Seonghoon, et al.
Published: (2026)
by: Yu, Seonghoon, et al.
Published: (2026)
Think Twice to See More: Iterative Visual Reasoning in Medical VLMs
by: Chen, Kaitao, et al.
Published: (2025)
by: Chen, Kaitao, et al.
Published: (2025)
Watch Wider and Think Deeper: Collaborative Cross-modal Chain-of-Thought for Complex Visual Reasoning
by: Lu, Wenting, et al.
Published: (2026)
by: Lu, Wenting, et al.
Published: (2026)
Enhancing Advanced Visual Reasoning Ability of Large Language Models
by: Li, Zhiyuan, et al.
Published: (2024)
by: Li, Zhiyuan, et al.
Published: (2024)
Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning
by: Wang, Haozhe, et al.
Published: (2026)
by: Wang, Haozhe, et al.
Published: (2026)
Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
by: Zhan, Yufei, et al.
Published: (2025)
by: Zhan, Yufei, et al.
Published: (2025)
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
by: Liu, Chengzhi, et al.
Published: (2025)
by: Liu, Chengzhi, et al.
Published: (2025)
See, Act, Adapt: Active Perception for Unsupervised Cross-Domain Visual Adaptation via Personalized VLM-Guided Agent
by: Tang, Tianci, et al.
Published: (2026)
by: Tang, Tianci, et al.
Published: (2026)
MediSee: Reasoning-based Pixel-level Perception in Medical Images
by: Tong, Qinyue, et al.
Published: (2025)
by: Tong, Qinyue, et al.
Published: (2025)
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
by: Wu, Junfei, et al.
Published: (2025)
by: Wu, Junfei, et al.
Published: (2025)
SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes
by: Liu, Tianhui, et al.
Published: (2026)
by: Liu, Tianhui, et al.
Published: (2026)
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
by: Qin, Yiming, et al.
Published: (2025)
by: Qin, Yiming, et al.
Published: (2025)
MedHorizon: Towards Long-context Medical Video Understanding in the Wild
by: Du, Bodong, et al.
Published: (2026)
by: Du, Bodong, et al.
Published: (2026)
Beyond Frequency: Seeing Subtle Cues Through the Lens of Spatial Decomposition for Fine-Grained Visual Classification
by: Xu, Qin, et al.
Published: (2025)
by: Xu, Qin, et al.
Published: (2025)
VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning
by: Wang, Zhaozhi, et al.
Published: (2025)
by: Wang, Zhaozhi, et al.
Published: (2025)
VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action Policies
by: Gao, Mingjian, et al.
Published: (2026)
by: Gao, Mingjian, et al.
Published: (2026)
TwiFF (Think With Future Frames): A Large-Scale Dataset for Dynamic Visual Reasoning
by: Liu, Junhua, et al.
Published: (2026)
by: Liu, Junhua, et al.
Published: (2026)
Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoT
by: Dong, Zhuobai, et al.
Published: (2025)
by: Dong, Zhuobai, et al.
Published: (2025)
Belief-Aware VLM Model for Human-like Reasoning
by: Nayak, Anshul, et al.
Published: (2026)
by: Nayak, Anshul, et al.
Published: (2026)
Semore: VLM-guided Enhanced Semantic Motion Representations for Visual Reinforcement Learning
by: Wang, Wentao, et al.
Published: (2025)
by: Wang, Wentao, et al.
Published: (2025)
Unlocking Feature Visualization for Deeper Networks with MAgnitude Constrained Optimization
by: Fel, Thomas, et al.
Published: (2023)
by: Fel, Thomas, et al.
Published: (2023)
Thinking with Gaze: Sequential Eye-Tracking as Visual Reasoning Supervision for Medical VLMs
by: Li, Yiwei, et al.
Published: (2026)
by: Li, Yiwei, et al.
Published: (2026)
CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
by: Duan, Chengqi, et al.
Published: (2025)
by: Duan, Chengqi, et al.
Published: (2025)
Chatting with Images for Introspective Visual Thinking
by: Wu, Junfei, et al.
Published: (2026)
by: Wu, Junfei, et al.
Published: (2026)
SeqVLM: Proposal-Guided Multi-View Sequences Reasoning via VLM for Zero-Shot 3D Visual Grounding
by: Lin, Jiawen, et al.
Published: (2025)
by: Lin, Jiawen, et al.
Published: (2025)
Seeing but Not Believing: Probing the Disconnect Between Visual Attention and Answer Correctness in VLMs
by: Liu, Zhining, et al.
Published: (2025)
by: Liu, Zhining, et al.
Published: (2025)
Enhancing Spatial Reasoning through Visual and Textual Thinking
by: Liang, Xun, et al.
Published: (2025)
by: Liang, Xun, et al.
Published: (2025)
GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking
by: Zhan, Yufei, et al.
Published: (2025)
by: Zhan, Yufei, et al.
Published: (2025)
Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks
by: Jing, Miao, et al.
Published: (2025)
by: Jing, Miao, et al.
Published: (2025)
Let's Think with Images Efficiently! An Interleaved-Modal Chain-of-Thought Reasoning Framework with Dynamic and Precise Visual Thoughts
by: Liu, Xu, et al.
Published: (2026)
by: Liu, Xu, et al.
Published: (2026)
Think Visually, Reason Textually: Vision-Language Synergy in ARC
by: Zhang, Beichen, et al.
Published: (2025)
by: Zhang, Beichen, et al.
Published: (2025)
Think-as-You-See: Streaming Chain-of-Thought Reasoning for Large Vision-Language Models
by: Zhang, Jialiang, et al.
Published: (2026)
by: Zhang, Jialiang, et al.
Published: (2026)
Deeper Thought, Weaker Aim: Understanding and Mitigating Perceptual Impairment during Reasoning in Multimodal Large Language Models
by: Peng, Ruiying, et al.
Published: (2026)
by: Peng, Ruiying, et al.
Published: (2026)
See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition
by: Si, Chongjie, et al.
Published: (2024)
by: Si, Chongjie, et al.
Published: (2024)
VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models
by: Wu, Kui, et al.
Published: (2025)
by: Wu, Kui, et al.
Published: (2025)
Attention in Space: Functional Roles of VLM Heads for Spatial Reasoning
by: Ma, Xueqi, et al.
Published: (2026)
by: Ma, Xueqi, et al.
Published: (2026)
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
by: Wu, Yixuan, et al.
Published: (2024)
by: Wu, Yixuan, et al.
Published: (2024)
InsightSee: Advancing Multi-agent Vision-Language Models for Enhanced Visual Understanding
by: Zhang, Huaxiang, et al.
Published: (2024)
by: Zhang, Huaxiang, et al.
Published: (2024)
Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
by: Xiao, Xin, et al.
Published: (2024)
by: Xiao, Xin, et al.
Published: (2024)
Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning
by: Pang, Yuqi, et al.
Published: (2025)
by: Pang, Yuqi, et al.
Published: (2025)
Similar Items
-
Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation
by: Yu, Seonghoon, et al.
Published: (2026) -
Think Twice to See More: Iterative Visual Reasoning in Medical VLMs
by: Chen, Kaitao, et al.
Published: (2025) -
Watch Wider and Think Deeper: Collaborative Cross-modal Chain-of-Thought for Complex Visual Reasoning
by: Lu, Wenting, et al.
Published: (2026) -
Enhancing Advanced Visual Reasoning Ability of Large Language Models
by: Li, Zhiyuan, et al.
Published: (2024) -
Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning
by: Wang, Haozhe, et al.
Published: (2026)