:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Peiyao, Xu, Haotian, Vesdapunt, Noranart, Hou, Rui, Zhang, Jingyi, Ling, Haibin, Obiednikov, Oleksandr, Zhou, Ning, Fu, Kah Kuen
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.25942
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

$Δ$ynamics: Language-Based Representation for Inferring Rigid-Body Dynamics From Videos
by: Kao, Chia-Hsiang, et al.
Published: (2026)

SVQA-R1: Reinforcing Spatial Reasoning in MLLMs via View-Consistent Reward Optimization
by: Wang, Peiyao, et al.
Published: (2025)

Efficient Temporal Action Segmentation via Boundary-aware Query Voting
by: Wang, Peiyao, et al.
Published: (2024)

RCoT-Seg: Reinforced Chain-of-Thought for Video Reasoning and Segmentation
by: Wen, Junwei, et al.
Published: (2026)

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding
by: Fu, Xingyu, et al.
Published: (2025)

Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought
by: Zhang, Shuyi, et al.
Published: (2025)

Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought
by: Cheng, Zihui, et al.
Published: (2025)

ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models
by: Zhang, Yongheng, et al.
Published: (2025)

VideoCoT: A Video Chain-of-Thought Dataset with Active Annotation Tool
by: Wang, Yan, et al.
Published: (2024)

S-Chain: Structured Visual Chain-of-Thought For Medicine
by: Le-Duc, Khai, et al.
Published: (2025)

Understanding and Mitigating Hallucinations in Multimodal Chain-of-Thought Models
by: Ma, Ji, et al.
Published: (2026)

Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation
by: Xia, Jiaer, et al.
Published: (2025)

Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization
by: Du, Yifan, et al.
Published: (2025)

Rethinking Chain-of-Thought Reasoning for Videos
by: Zhong, Yiwu, et al.
Published: (2025)

From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation
by: Niu, Ke, et al.
Published: (2025)

CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion
by: Tang, Yolo Yunlong, et al.
Published: (2024)

CoT-RVS: Zero-Shot Chain-of-Thought Reasoning Segmentation for Videos
by: Kao, Shiu-hong, et al.
Published: (2025)

C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation
by: Li, Yuhao, et al.
Published: (2025)

Chain of Event-Centric Causal Thought for Physically Plausible Video Generation
by: Wang, Zixuan, et al.
Published: (2026)

Dynamic Token Compression for Efficient Video Understanding through Reinforcement Learning
by: Wang, Shida, et al.
Published: (2026)

SurgCoT: Advancing Spatiotemporal Reasoning in Surgical Videos through a Chain-of-Thought Benchmark
by: Wang, Gui, et al.
Published: (2026)

Spatial Chain-of-Thought: Bridging Understanding and Generation Models for Spatial Reasoning Generation
by: Chen, Wei, et al.
Published: (2026)

Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought
by: Li, Yunheng, et al.
Published: (2026)

MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning
by: Shi, Weikang, et al.
Published: (2025)

PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection
by: Wang, Peiyao, et al.
Published: (2025)

ArgusCogito: Chain-of-Thought for Cross-Modal Synergy and Omnidirectional Reasoning in Camouflaged Object Segmentation
by: Tan, Jianwen, et al.
Published: (2025)

Chain-of-Cooking:Cooking Process Visualization via Bidirectional Chain-of-Thought Guidance
by: Xu, Mengling, et al.
Published: (2025)

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
by: Wang, Yibin, et al.
Published: (2025)

StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA
by: Hu, Yuhang, et al.
Published: (2025)

TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis
by: Li, Sijing, et al.
Published: (2026)

VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG
by: Fu, Honghao, et al.
Published: (2026)

CoTasks: Chain-of-Thought based Video Instruction Tuning Tasks
by: Wang, Yanan, et al.
Published: (2025)

Generative Visual Chain-of-Thought for Image Editing
by: Yin, Zijin, et al.
Published: (2026)

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
by: Gao, Timin, et al.
Published: (2024)

VideoTIR: Accurate Understanding for Long Videos with Efficient Tool-Integrated Reasoning
by: Gao, Zhe, et al.
Published: (2026)

Video-CoE: Reinforcing Video Event Prediction via Chain of Events
by: Su, Qile, et al.
Published: (2026)

MedCoT: Medical Chain of Thought via Hierarchical Expert
by: Liu, Jiaxiang, et al.
Published: (2024)

CoS: Chain-of-Shot Prompting for Long Video Understanding
by: Hu, Jian, et al.
Published: (2025)

Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner
by: Chen, Lei, et al.
Published: (2025)

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
by: Han, Songhao, et al.
Published: (2024)