Saved in:
| Main Authors: | Qiu, Longtian, Ning, Shan, Sun, Jiaxuan, He, Xuming |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.21122 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum
by: Ning, Shan, et al.
Published: (2026)
by: Ning, Shan, et al.
Published: (2026)
WikiCLIP: An Efficient Contrastive Baseline for Open-domain Visual Entity Recognition
by: Ning, Shan, et al.
Published: (2026)
by: Ning, Shan, et al.
Published: (2026)
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
by: Qiu, Longtian, et al.
Published: (2024)
by: Qiu, Longtian, et al.
Published: (2024)
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
by: Yao, Huanjin, et al.
Published: (2025)
by: Yao, Huanjin, et al.
Published: (2025)
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
by: Tong, Chengzhuo, et al.
Published: (2025)
by: Tong, Chengzhuo, et al.
Published: (2025)
Empowering Lightweight MLLMs with Reasoning via Long CoT SFT
by: Ou, Linyu, et al.
Published: (2025)
by: Ou, Linyu, et al.
Published: (2025)
Noisy Deep Ensemble: Accelerating Deep Ensemble Learning via Noise Injection
by: Sakai, Shunsuke, et al.
Published: (2025)
by: Sakai, Shunsuke, et al.
Published: (2025)
EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT
by: Pei, Baoqi, et al.
Published: (2025)
by: Pei, Baoqi, et al.
Published: (2025)
AIM-CoT: Active Information-driven Multimodal Chain-of-Thought for Vision-Language Reasoning
by: Li, Xiping, et al.
Published: (2025)
by: Li, Xiping, et al.
Published: (2025)
Seeing Through the Chain: Mitigate Hallucination in Multimodal Reasoning Models via CoT Compression and Contrastive Preference Optimization
by: Fang, Hao, et al.
Published: (2026)
by: Fang, Hao, et al.
Published: (2026)
CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
by: Lin, Weihuang, et al.
Published: (2025)
by: Lin, Weihuang, et al.
Published: (2025)
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
by: Xu, Guowei, et al.
Published: (2024)
by: Xu, Guowei, et al.
Published: (2024)
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
by: Chen, Yi, et al.
Published: (2025)
by: Chen, Yi, et al.
Published: (2025)
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
by: Gao, Minghe, et al.
Published: (2025)
by: Gao, Minghe, et al.
Published: (2025)
X-Ray-CoT: Interpretable Chest X-ray Diagnosis with Vision-Language Models via Chain-of-Thought Reasoning
by: Ng, Chee, et al.
Published: (2025)
by: Ng, Chee, et al.
Published: (2025)
CoT-RVS: Zero-Shot Chain-of-Thought Reasoning Segmentation for Videos
by: Kao, Shiu-hong, et al.
Published: (2025)
by: Kao, Shiu-hong, et al.
Published: (2025)
CoT-Seg: Rethinking Segmentation with Chain-of-Thought Reasoning and Self-Correction
by: Kao, Shiu-hong, et al.
Published: (2026)
by: Kao, Shiu-hong, et al.
Published: (2026)
ViRC: Enhancing Visual Interleaved Mathematical CoT with Reason Chunking
by: Wang, Lihong, et al.
Published: (2025)
by: Wang, Lihong, et al.
Published: (2025)
X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning
by: Pulakurthi, Prasanna Reddy, et al.
Published: (2025)
by: Pulakurthi, Prasanna Reddy, et al.
Published: (2025)
CoT-Segmenter: Enhancing OOD Detection in Dense Road Scenes via Chain-of-Thought Reasoning
by: Song, Jeonghyo, et al.
Published: (2025)
by: Song, Jeonghyo, et al.
Published: (2025)
ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL
by: Zhang, Yu, et al.
Published: (2025)
by: Zhang, Yu, et al.
Published: (2025)
VG-CoT: Towards Trustworthy Visual Reasoning via Grounded Chain-of-Thought
by: Lim, Byeonggeuk, et al.
Published: (2026)
by: Lim, Byeonggeuk, et al.
Published: (2026)
FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling
by: Qiu, Haonan, et al.
Published: (2023)
by: Qiu, Haonan, et al.
Published: (2023)
CoTZero: Annotation-Free Human-Like Vision Reasoning via Hierarchical Synthetic CoT
by: Du, Chengyi, et al.
Published: (2026)
by: Du, Chengyi, et al.
Published: (2026)
Image Aesthetic Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance
by: Hu, Zhiyuan, et al.
Published: (2025)
by: Hu, Zhiyuan, et al.
Published: (2025)
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
by: Jiang, Dongzhi, et al.
Published: (2025)
by: Jiang, Dongzhi, et al.
Published: (2025)
MILD: Modeling the Instance Learning Dynamics for Learning with Noisy Labels
by: Hu, Chuanyang, et al.
Published: (2023)
by: Hu, Chuanyang, et al.
Published: (2023)
Multimodal Latent Reasoning via Hierarchical Visual Cues Injection
by: Zhang, Yiming, et al.
Published: (2026)
by: Zhang, Yiming, et al.
Published: (2026)
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
by: Chen, Xinyan, et al.
Published: (2025)
by: Chen, Xinyan, et al.
Published: (2025)
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
by: Huang, Qihan, et al.
Published: (2025)
by: Huang, Qihan, et al.
Published: (2025)
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning
by: Sun, Hai-Long, et al.
Published: (2025)
by: Sun, Hai-Long, et al.
Published: (2025)
Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
by: Qiu, Haibo, et al.
Published: (2025)
by: Qiu, Haibo, et al.
Published: (2025)
Automated Movie Generation via Multi-Agent CoT Planning
by: Wu, Weijia, et al.
Published: (2025)
by: Wu, Weijia, et al.
Published: (2025)
DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry
by: Cai, Zhenyang, et al.
Published: (2025)
by: Cai, Zhenyang, et al.
Published: (2025)
NoiseSDF2NoiseSDF: Learning Clean Neural Fields from Noisy Supervision
by: Wang, Tengkai, et al.
Published: (2025)
by: Wang, Tengkai, et al.
Published: (2025)
CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
by: Duan, Chengqi, et al.
Published: (2025)
by: Duan, Chengqi, et al.
Published: (2025)
MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration
by: Wei, Lai, et al.
Published: (2024)
by: Wei, Lai, et al.
Published: (2024)
FakeVLM-R1: Internalizing Physical Laws via CoT for Synthetic Image Detection
by: Zhu, Leqi, et al.
Published: (2026)
by: Zhu, Leqi, et al.
Published: (2026)
ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
by: Liao, Jiaqi, et al.
Published: (2025)
by: Liao, Jiaqi, et al.
Published: (2025)
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
by: Li, Ang, et al.
Published: (2025)
by: Li, Ang, et al.
Published: (2025)
Similar Items
-
Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum
by: Ning, Shan, et al.
Published: (2026) -
WikiCLIP: An Efficient Contrastive Baseline for Open-domain Visual Entity Recognition
by: Ning, Shan, et al.
Published: (2026) -
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
by: Qiu, Longtian, et al.
Published: (2024) -
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
by: Yao, Huanjin, et al.
Published: (2025) -
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
by: Tong, Chengzhuo, et al.
Published: (2025)