:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Qiu, Longtian, Ning, Shan, Sun, Jiaxuan, He, Xuming
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2510.21122
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum
by: Ning, Shan, et al.
Published: (2026)

WikiCLIP: An Efficient Contrastive Baseline for Open-domain Visual Entity Recognition
by: Ning, Shan, et al.
Published: (2026)

Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
by: Qiu, Longtian, et al.
Published: (2024)

R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
by: Yao, Huanjin, et al.
Published: (2025)

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
by: Tong, Chengzhuo, et al.
Published: (2025)

Empowering Lightweight MLLMs with Reasoning via Long CoT SFT
by: Ou, Linyu, et al.
Published: (2025)

Noisy Deep Ensemble: Accelerating Deep Ensemble Learning via Noise Injection
by: Sakai, Shunsuke, et al.
Published: (2025)

EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT
by: Pei, Baoqi, et al.
Published: (2025)

AIM-CoT: Active Information-driven Multimodal Chain-of-Thought for Vision-Language Reasoning
by: Li, Xiping, et al.
Published: (2025)

Seeing Through the Chain: Mitigate Hallucination in Multimodal Reasoning Models via CoT Compression and Contrastive Preference Optimization
by: Fang, Hao, et al.
Published: (2026)

CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
by: Lin, Weihuang, et al.
Published: (2025)

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
by: Xu, Guowei, et al.
Published: (2024)

GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
by: Chen, Yi, et al.
Published: (2025)

Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
by: Gao, Minghe, et al.
Published: (2025)

X-Ray-CoT: Interpretable Chest X-ray Diagnosis with Vision-Language Models via Chain-of-Thought Reasoning
by: Ng, Chee, et al.
Published: (2025)

CoT-RVS: Zero-Shot Chain-of-Thought Reasoning Segmentation for Videos
by: Kao, Shiu-hong, et al.
Published: (2025)

CoT-Seg: Rethinking Segmentation with Chain-of-Thought Reasoning and Self-Correction
by: Kao, Shiu-hong, et al.
Published: (2026)

ViRC: Enhancing Visual Interleaved Mathematical CoT with Reason Chunking
by: Wang, Lihong, et al.
Published: (2025)

X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning
by: Pulakurthi, Prasanna Reddy, et al.
Published: (2025)

CoT-Segmenter: Enhancing OOD Detection in Dense Road Scenes via Chain-of-Thought Reasoning
by: Song, Jeonghyo, et al.
Published: (2025)

ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL
by: Zhang, Yu, et al.
Published: (2025)

VG-CoT: Towards Trustworthy Visual Reasoning via Grounded Chain-of-Thought
by: Lim, Byeonggeuk, et al.
Published: (2026)

FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling
by: Qiu, Haonan, et al.
Published: (2023)

CoTZero: Annotation-Free Human-Like Vision Reasoning via Hierarchical Synthetic CoT
by: Du, Chengyi, et al.
Published: (2026)

Image Aesthetic Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance
by: Hu, Zhiyuan, et al.
Published: (2025)

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
by: Jiang, Dongzhi, et al.
Published: (2025)

MILD: Modeling the Instance Learning Dynamics for Learning with Noisy Labels
by: Hu, Chuanyang, et al.
Published: (2023)

Multimodal Latent Reasoning via Hierarchical Visual Cues Injection
by: Zhang, Yiming, et al.
Published: (2026)

MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
by: Chen, Xinyan, et al.
Published: (2025)

Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
by: Huang, Qihan, et al.
Published: (2025)

Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning
by: Sun, Hai-Long, et al.
Published: (2025)

Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
by: Qiu, Haibo, et al.
Published: (2025)

Automated Movie Generation via Multi-Agent CoT Planning
by: Wu, Weijia, et al.
Published: (2025)

DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry
by: Cai, Zhenyang, et al.
Published: (2025)

NoiseSDF2NoiseSDF: Learning Clean Neural Fields from Noisy Supervision
by: Wang, Tengkai, et al.
Published: (2025)

CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
by: Duan, Chengqi, et al.
Published: (2025)

MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration
by: Wei, Lai, et al.
Published: (2024)

FakeVLM-R1: Internalizing Physical Laws via CoT for Synthetic Image Detection
by: Zhu, Leqi, et al.
Published: (2026)

ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
by: Liao, Jiaqi, et al.
Published: (2025)

Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
by: Li, Ang, et al.
Published: (2025)