Saved in:
| Main Authors: | Lu, Wenting, Zhu, Didi, Shen, Tao, Zhu, Donglin, Ye, Ayong, Wu, Chao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.02422 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought
by: Lu, Yi, et al.
Published: (2025)
by: Lu, Yi, et al.
Published: (2025)
Thinking with Images as Continuous Actions: Numerical Visual Chain-of-Thought
by: Zhao, Kesen, et al.
Published: (2026)
by: Zhao, Kesen, et al.
Published: (2026)
WISER: Wider Search, Deeper Thinking, and Adaptive Fusion for Training-Free Zero-Shot Composed Image Retrieval
by: Wang, Tianyue, et al.
Published: (2026)
by: Wang, Tianyue, et al.
Published: (2026)
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
by: Zhao, Kesen, et al.
Published: (2025)
by: Zhao, Kesen, et al.
Published: (2025)
Think-as-You-See: Streaming Chain-of-Thought Reasoning for Large Vision-Language Models
by: Zhang, Jialiang, et al.
Published: (2026)
by: Zhang, Jialiang, et al.
Published: (2026)
FreeFly-Thinking : Aligning Chain-of-Thought Reasoning with Continuous UAV Navigation
by: Zhou, Jiaxu, et al.
Published: (2026)
by: Zhou, Jiaxu, et al.
Published: (2026)
See Further, Think Deeper: Advancing VLM's Reasoning Ability with Low-level Visual Cues and Reflection
by: Wu, Zhiheng, et al.
Published: (2026)
by: Wu, Zhiheng, et al.
Published: (2026)
Let's Think with Images Efficiently! An Interleaved-Modal Chain-of-Thought Reasoning Framework with Dynamic and Precise Visual Thoughts
by: Liu, Xu, et al.
Published: (2026)
by: Liu, Xu, et al.
Published: (2026)
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
by: Zhou, Yiyang, et al.
Published: (2025)
by: Zhou, Yiyang, et al.
Published: (2025)
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
by: Gu, Jiawei, et al.
Published: (2025)
by: Gu, Jiawei, et al.
Published: (2025)
Beyond Static Visual Tokens: Structured Sequential Visual Chain-of-Thought Reasoning
by: Guo, Guangfu, et al.
Published: (2026)
by: Guo, Guangfu, et al.
Published: (2026)
Long Grounded Thoughts: Synthesizing Visual Problems and Reasoning Chains at Scale
by: Acuna, David, et al.
Published: (2025)
by: Acuna, David, et al.
Published: (2025)
Deeper Thought, Weaker Aim: Understanding and Mitigating Perceptual Impairment during Reasoning in Multimodal Large Language Models
by: Peng, Ruiying, et al.
Published: (2026)
by: Peng, Ruiying, et al.
Published: (2026)
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning
by: Wu, Linquan, et al.
Published: (2026)
by: Wu, Linquan, et al.
Published: (2026)
Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning
by: Deng, Linger, et al.
Published: (2024)
by: Deng, Linger, et al.
Published: (2024)
ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think
by: Feng, Tao, et al.
Published: (2025)
by: Feng, Tao, et al.
Published: (2025)
MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning
by: Kumar, Somnath, et al.
Published: (2024)
by: Kumar, Somnath, et al.
Published: (2024)
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval
by: Li, Hao, et al.
Published: (2023)
by: Li, Hao, et al.
Published: (2023)
Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
by: Kao, Shiu-hong, et al.
Published: (2025)
by: Kao, Shiu-hong, et al.
Published: (2025)
ArgusCogito: Chain-of-Thought for Cross-Modal Synergy and Omnidirectional Reasoning in Camouflaged Object Segmentation
by: Tan, Jianwen, et al.
Published: (2025)
by: Tan, Jianwen, et al.
Published: (2025)
Chain-of-Cooking:Cooking Process Visualization via Bidirectional Chain-of-Thought Guidance
by: Xu, Mengling, et al.
Published: (2025)
by: Xu, Mengling, et al.
Published: (2025)
Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning
by: Wang, Yifan, et al.
Published: (2026)
by: Wang, Yifan, et al.
Published: (2026)
Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning
by: He, Hulingxiao, et al.
Published: (2026)
by: He, Hulingxiao, et al.
Published: (2026)
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning
by: Shi, Weikang, et al.
Published: (2025)
by: Shi, Weikang, et al.
Published: (2025)
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
by: Qin, Yiming, et al.
Published: (2025)
by: Qin, Yiming, et al.
Published: (2025)
RGBX-R1: Visual Modality Chain-of-Thought Guided Reinforcement Learning for Multimodal Grounding
by: Wu, Jiahe, et al.
Published: (2026)
by: Wu, Jiahe, et al.
Published: (2026)
Boosting Multi-modal Keyphrase Prediction with Dynamic Chain-of-Thought in Vision-Language Models
by: Ma, Qihang, et al.
Published: (2025)
by: Ma, Qihang, et al.
Published: (2025)
Shape of Thought: Progressive Object Assembly via Visual Chain-of-Thought
by: Huo, Yu, et al.
Published: (2026)
by: Huo, Yu, et al.
Published: (2026)
PixelThink: Towards Efficient Chain-of-Pixel Reasoning
by: Wang, Song, et al.
Published: (2025)
by: Wang, Song, et al.
Published: (2025)
AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving
by: Qian, Kangan, et al.
Published: (2025)
by: Qian, Kangan, et al.
Published: (2025)
Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought
by: Huang, Chao, et al.
Published: (2025)
by: Huang, Chao, et al.
Published: (2025)
Generative Visual Chain-of-Thought for Image Editing
by: Yin, Zijin, et al.
Published: (2026)
by: Yin, Zijin, et al.
Published: (2026)
VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning
by: Li, Lingxiao, et al.
Published: (2025)
by: Li, Lingxiao, et al.
Published: (2025)
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data
by: Fan, Chengxiang, et al.
Published: (2024)
by: Fan, Chengxiang, et al.
Published: (2024)
Enhancing Video Memorability Prediction with Text-Motion Cross-modal Contrastive Loss and Its Application in Video Summarization
by: Zhu, Zhiyi, et al.
Published: (2025)
by: Zhu, Zhiyi, et al.
Published: (2025)
RCoT-Seg: Reinforced Chain-of-Thought for Video Reasoning and Segmentation
by: Wen, Junwei, et al.
Published: (2026)
by: Wen, Junwei, et al.
Published: (2026)
Geospatial Chain of Thought Reasoning for Enhanced Visual Question Answering on Satellite Imagery
by: Shanker, Shambhavi, et al.
Published: (2025)
by: Shanker, Shambhavi, et al.
Published: (2025)
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
by: Chen, Xinyan, et al.
Published: (2025)
by: Chen, Xinyan, et al.
Published: (2025)
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
by: Wang, Yaoting, et al.
Published: (2025)
by: Wang, Yaoting, et al.
Published: (2025)
ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding
by: Guan, Yiran, et al.
Published: (2026)
by: Guan, Yiran, et al.
Published: (2026)
Similar Items
-
RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought
by: Lu, Yi, et al.
Published: (2025) -
Thinking with Images as Continuous Actions: Numerical Visual Chain-of-Thought
by: Zhao, Kesen, et al.
Published: (2026) -
WISER: Wider Search, Deeper Thinking, and Adaptive Fusion for Training-Free Zero-Shot Composed Image Retrieval
by: Wang, Tianyue, et al.
Published: (2026) -
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
by: Zhao, Kesen, et al.
Published: (2025) -
Think-as-You-See: Streaming Chain-of-Thought Reasoning for Large Vision-Language Models
by: Zhang, Jialiang, et al.
Published: (2026)