Saved in:
| Main Authors: | Du, Yifan, Zhou, Kun, Min, Yingqian, Ling, Yue, Zhao, Wayne Xin, Wu, Youbin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.22586 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Improving Vision-language Models with Perception-centric Process Reward Models
by: Min, Yingqian, et al.
Published: (2026)
by: Min, Yingqian, et al.
Published: (2026)
Beyond the Last Frame: Process-aware Evaluation for Generative Video Reasoning
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning
by: Li, Lingxiao, et al.
Published: (2025)
by: Li, Lingxiao, et al.
Published: (2025)
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
by: Man, Yunze, et al.
Published: (2025)
by: Man, Yunze, et al.
Published: (2025)
Multimodal Lengthy Videos Retrieval Framework and Evaluation Metric
by: Eltahir, Mohamed, et al.
Published: (2025)
by: Eltahir, Mohamed, et al.
Published: (2025)
MCoT-MVS: Multi-level Vision Selection by Multi-modal Chain-of-Thought Reasoning for Composed Image Retrieval
by: Ge, Xuri, et al.
Published: (2026)
by: Ge, Xuri, et al.
Published: (2026)
Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning
by: Wang, Yifan, et al.
Published: (2026)
by: Wang, Yifan, et al.
Published: (2026)
Think-as-You-See: Streaming Chain-of-Thought Reasoning for Large Vision-Language Models
by: Zhang, Jialiang, et al.
Published: (2026)
by: Zhang, Jialiang, et al.
Published: (2026)
AVC-DPO: Aligned Video Captioning via Direct Preference Optimization
by: Tang, Jiyang, et al.
Published: (2025)
by: Tang, Jiyang, et al.
Published: (2025)
RCoT-Seg: Reinforced Chain-of-Thought for Video Reasoning and Segmentation
by: Wen, Junwei, et al.
Published: (2026)
by: Wen, Junwei, et al.
Published: (2026)
Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models
by: Pham, Tan-Hanh, et al.
Published: (2025)
by: Pham, Tan-Hanh, et al.
Published: (2025)
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
by: Du, Yifan, et al.
Published: (2023)
by: Du, Yifan, et al.
Published: (2023)
Thinking with Images as Continuous Actions: Numerical Visual Chain-of-Thought
by: Zhao, Kesen, et al.
Published: (2026)
by: Zhao, Kesen, et al.
Published: (2026)
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
by: Wang, Yaoting, et al.
Published: (2025)
by: Wang, Yaoting, et al.
Published: (2025)
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
by: Zhao, Kesen, et al.
Published: (2025)
by: Zhao, Kesen, et al.
Published: (2025)
DualCoT-VLA: Visual-Linguistic Chain of Thought via Parallel Reasoning for Vision-Language-Action Models
by: Zhong, Zhide, et al.
Published: (2026)
by: Zhong, Zhide, et al.
Published: (2026)
TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis
by: Li, Sijing, et al.
Published: (2026)
by: Li, Sijing, et al.
Published: (2026)
AIM-CoT: Active Information-driven Multimodal Chain-of-Thought for Vision-Language Reasoning
by: Li, Xiping, et al.
Published: (2025)
by: Li, Xiping, et al.
Published: (2025)
FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation
by: Zuo, Jing, et al.
Published: (2026)
by: Zuo, Jing, et al.
Published: (2026)
Unleashing Perception-Time Scaling to Multimodal Reasoning Models
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
Spatial Chain-of-Thought: Bridging Understanding and Generation Models for Spatial Reasoning Generation
by: Chen, Wei, et al.
Published: (2026)
by: Chen, Wei, et al.
Published: (2026)
Analyzing and Mitigating Object Hallucination: A Training Bias Perspective
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
by: Li, Yifan, et al.
Published: (2024)
by: Li, Yifan, et al.
Published: (2024)
Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning
by: Deng, Linger, et al.
Published: (2024)
by: Deng, Linger, et al.
Published: (2024)
GeoSense: Internalizing Geometric Necessity Perception for Multimodal Reasoning
by: Liu, Ruiheng, et al.
Published: (2026)
by: Liu, Ruiheng, et al.
Published: (2026)
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
by: Chen, Yangyi, et al.
Published: (2023)
by: Chen, Yangyi, et al.
Published: (2023)
Explainable Action Form Assessment by Exploiting Multimodal Chain-of-Thoughts Reasoning
by: Qi, Mengshi, et al.
Published: (2025)
by: Qi, Mengshi, et al.
Published: (2025)
Chain-of-Cooking:Cooking Process Visualization via Bidirectional Chain-of-Thought Guidance
by: Xu, Mengling, et al.
Published: (2025)
by: Xu, Mengling, et al.
Published: (2025)
Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning
by: Nai, Ruiqian, et al.
Published: (2024)
by: Nai, Ruiqian, et al.
Published: (2024)
ReasoningTrack: Chain-of-Thought Reasoning for Long-term Vision-Language Tracking
by: Wang, Xiao, et al.
Published: (2025)
by: Wang, Xiao, et al.
Published: (2025)
Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
by: Qin, Luozheng, et al.
Published: (2025)
by: Qin, Luozheng, et al.
Published: (2025)
Better Eyes, Better Thoughts: Why Vision Chain-of-Thought Fails in Medicine
by: Wu, Yuan, et al.
Published: (2026)
by: Wu, Yuan, et al.
Published: (2026)
CoMT: Chain-of-Medical-Thought Reduces Hallucination in Medical Report Generation
by: Jiang, Yue, et al.
Published: (2024)
by: Jiang, Yue, et al.
Published: (2024)
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
by: Zhao, Qingqing, et al.
Published: (2025)
by: Zhao, Qingqing, et al.
Published: (2025)
MedCoT: Medical Chain of Thought via Hierarchical Expert
by: Liu, Jiaxiang, et al.
Published: (2024)
by: Liu, Jiaxiang, et al.
Published: (2024)
Live-E2T: Real-time Threat Monitoring in Video via Deduplicated Event Reasoning and Chain-of-Thought
by: Wang, Yuhan, et al.
Published: (2025)
by: Wang, Yuhan, et al.
Published: (2025)
FreeFly-Thinking : Aligning Chain-of-Thought Reasoning with Continuous UAV Navigation
by: Zhou, Jiaxu, et al.
Published: (2026)
by: Zhou, Jiaxu, et al.
Published: (2026)
Beyond Static Visual Tokens: Structured Sequential Visual Chain-of-Thought Reasoning
by: Guo, Guangfu, et al.
Published: (2026)
by: Guo, Guangfu, et al.
Published: (2026)
Generative Universal Verifier as Multimodal Meta-Reasoner
by: Zhang, Xinchen, et al.
Published: (2025)
by: Zhang, Xinchen, et al.
Published: (2025)
RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning
by: Chen, Qiguang, et al.
Published: (2025)
by: Chen, Qiguang, et al.
Published: (2025)
Similar Items
-
Improving Vision-language Models with Perception-centric Process Reward Models
by: Min, Yingqian, et al.
Published: (2026) -
Beyond the Last Frame: Process-aware Evaluation for Generative Video Reasoning
by: Li, Yifan, et al.
Published: (2025) -
VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning
by: Li, Lingxiao, et al.
Published: (2025) -
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
by: Man, Yunze, et al.
Published: (2025) -
Multimodal Lengthy Videos Retrieval Framework and Evaluation Metric
by: Eltahir, Mohamed, et al.
Published: (2025)