:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lu, Wenting, Zhu, Didi, Shen, Tao, Zhu, Donglin, Ye, Ayong, Wu, Chao
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.02422
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought
by: Lu, Yi, et al.
Published: (2025)

Thinking with Images as Continuous Actions: Numerical Visual Chain-of-Thought
by: Zhao, Kesen, et al.
Published: (2026)

WISER: Wider Search, Deeper Thinking, and Adaptive Fusion for Training-Free Zero-Shot Composed Image Retrieval
by: Wang, Tianyue, et al.
Published: (2026)

Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
by: Zhao, Kesen, et al.
Published: (2025)

Think-as-You-See: Streaming Chain-of-Thought Reasoning for Large Vision-Language Models
by: Zhang, Jialiang, et al.
Published: (2026)

FreeFly-Thinking : Aligning Chain-of-Thought Reasoning with Continuous UAV Navigation
by: Zhou, Jiaxu, et al.
Published: (2026)

See Further, Think Deeper: Advancing VLM's Reasoning Ability with Low-level Visual Cues and Reflection
by: Wu, Zhiheng, et al.
Published: (2026)

Let's Think with Images Efficiently! An Interleaved-Modal Chain-of-Thought Reasoning Framework with Dynamic and Precise Visual Thoughts
by: Liu, Xu, et al.
Published: (2026)

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
by: Zhou, Yiyang, et al.
Published: (2025)

ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
by: Gu, Jiawei, et al.
Published: (2025)

Beyond Static Visual Tokens: Structured Sequential Visual Chain-of-Thought Reasoning
by: Guo, Guangfu, et al.
Published: (2026)

Long Grounded Thoughts: Synthesizing Visual Problems and Reasoning Chains at Scale
by: Acuna, David, et al.
Published: (2025)

Deeper Thought, Weaker Aim: Understanding and Mitigating Perceptual Impairment during Reasoning in Multimodal Large Language Models
by: Peng, Ruiying, et al.
Published: (2026)

LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning
by: Wu, Linquan, et al.
Published: (2026)

Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning
by: Deng, Linger, et al.
Published: (2024)

ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think
by: Feng, Tao, et al.
Published: (2025)

MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning
by: Kumar, Somnath, et al.
Published: (2024)

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval
by: Li, Hao, et al.
Published: (2023)

Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
by: Kao, Shiu-hong, et al.
Published: (2025)

ArgusCogito: Chain-of-Thought for Cross-Modal Synergy and Omnidirectional Reasoning in Camouflaged Object Segmentation
by: Tan, Jianwen, et al.
Published: (2025)

Chain-of-Cooking:Cooking Process Visualization via Bidirectional Chain-of-Thought Guidance
by: Xu, Mengling, et al.
Published: (2025)

Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning
by: Wang, Yifan, et al.
Published: (2026)

Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning
by: He, Hulingxiao, et al.
Published: (2026)

MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning
by: Shi, Weikang, et al.
Published: (2025)

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
by: Qin, Yiming, et al.
Published: (2025)

RGBX-R1: Visual Modality Chain-of-Thought Guided Reinforcement Learning for Multimodal Grounding
by: Wu, Jiahe, et al.
Published: (2026)

Boosting Multi-modal Keyphrase Prediction with Dynamic Chain-of-Thought in Vision-Language Models
by: Ma, Qihang, et al.
Published: (2025)

Shape of Thought: Progressive Object Assembly via Visual Chain-of-Thought
by: Huo, Yu, et al.
Published: (2026)

PixelThink: Towards Efficient Chain-of-Pixel Reasoning
by: Wang, Song, et al.
Published: (2025)

AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving
by: Qian, Kangan, et al.
Published: (2025)

Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought
by: Huang, Chao, et al.
Published: (2025)

Generative Visual Chain-of-Thought for Image Editing
by: Yin, Zijin, et al.
Published: (2026)

VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning
by: Li, Lingxiao, et al.
Published: (2025)

DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data
by: Fan, Chengxiang, et al.
Published: (2024)

Enhancing Video Memorability Prediction with Text-Motion Cross-modal Contrastive Loss and Its Application in Video Summarization
by: Zhu, Zhiyi, et al.
Published: (2025)

RCoT-Seg: Reinforced Chain-of-Thought for Video Reasoning and Segmentation
by: Wen, Junwei, et al.
Published: (2026)

Geospatial Chain of Thought Reasoning for Enhanced Visual Question Answering on Satellite Imagery
by: Shanker, Shambhavi, et al.
Published: (2025)

MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
by: Chen, Xinyan, et al.
Published: (2025)

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
by: Wang, Yaoting, et al.
Published: (2025)

ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding
by: Guan, Yiran, et al.
Published: (2026)