Saved in:
| Main Authors: | Wang, Peiyao, Xu, Haotian, Vesdapunt, Noranart, Hou, Rui, Zhang, Jingyi, Ling, Haibin, Obiednikov, Oleksandr, Zhou, Ning, Fu, Kah Kuen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.25942 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
$Δ$ynamics: Language-Based Representation for Inferring Rigid-Body Dynamics From Videos
by: Kao, Chia-Hsiang, et al.
Published: (2026)
by: Kao, Chia-Hsiang, et al.
Published: (2026)
SVQA-R1: Reinforcing Spatial Reasoning in MLLMs via View-Consistent Reward Optimization
by: Wang, Peiyao, et al.
Published: (2025)
by: Wang, Peiyao, et al.
Published: (2025)
Efficient Temporal Action Segmentation via Boundary-aware Query Voting
by: Wang, Peiyao, et al.
Published: (2024)
by: Wang, Peiyao, et al.
Published: (2024)
RCoT-Seg: Reinforced Chain-of-Thought for Video Reasoning and Segmentation
by: Wen, Junwei, et al.
Published: (2026)
by: Wen, Junwei, et al.
Published: (2026)
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding
by: Fu, Xingyu, et al.
Published: (2025)
by: Fu, Xingyu, et al.
Published: (2025)
Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought
by: Zhang, Shuyi, et al.
Published: (2025)
by: Zhang, Shuyi, et al.
Published: (2025)
Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought
by: Cheng, Zihui, et al.
Published: (2025)
by: Cheng, Zihui, et al.
Published: (2025)
ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models
by: Zhang, Yongheng, et al.
Published: (2025)
by: Zhang, Yongheng, et al.
Published: (2025)
VideoCoT: A Video Chain-of-Thought Dataset with Active Annotation Tool
by: Wang, Yan, et al.
Published: (2024)
by: Wang, Yan, et al.
Published: (2024)
S-Chain: Structured Visual Chain-of-Thought For Medicine
by: Le-Duc, Khai, et al.
Published: (2025)
by: Le-Duc, Khai, et al.
Published: (2025)
Understanding and Mitigating Hallucinations in Multimodal Chain-of-Thought Models
by: Ma, Ji, et al.
Published: (2026)
by: Ma, Ji, et al.
Published: (2026)
Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation
by: Xia, Jiaer, et al.
Published: (2025)
by: Xia, Jiaer, et al.
Published: (2025)
Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization
by: Du, Yifan, et al.
Published: (2025)
by: Du, Yifan, et al.
Published: (2025)
Rethinking Chain-of-Thought Reasoning for Videos
by: Zhong, Yiwu, et al.
Published: (2025)
by: Zhong, Yiwu, et al.
Published: (2025)
From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation
by: Niu, Ke, et al.
Published: (2025)
by: Niu, Ke, et al.
Published: (2025)
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion
by: Tang, Yolo Yunlong, et al.
Published: (2024)
by: Tang, Yolo Yunlong, et al.
Published: (2024)
CoT-RVS: Zero-Shot Chain-of-Thought Reasoning Segmentation for Videos
by: Kao, Shiu-hong, et al.
Published: (2025)
by: Kao, Shiu-hong, et al.
Published: (2025)
C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation
by: Li, Yuhao, et al.
Published: (2025)
by: Li, Yuhao, et al.
Published: (2025)
Chain of Event-Centric Causal Thought for Physically Plausible Video Generation
by: Wang, Zixuan, et al.
Published: (2026)
by: Wang, Zixuan, et al.
Published: (2026)
Dynamic Token Compression for Efficient Video Understanding through Reinforcement Learning
by: Wang, Shida, et al.
Published: (2026)
by: Wang, Shida, et al.
Published: (2026)
SurgCoT: Advancing Spatiotemporal Reasoning in Surgical Videos through a Chain-of-Thought Benchmark
by: Wang, Gui, et al.
Published: (2026)
by: Wang, Gui, et al.
Published: (2026)
Spatial Chain-of-Thought: Bridging Understanding and Generation Models for Spatial Reasoning Generation
by: Chen, Wei, et al.
Published: (2026)
by: Chen, Wei, et al.
Published: (2026)
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought
by: Li, Yunheng, et al.
Published: (2026)
by: Li, Yunheng, et al.
Published: (2026)
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning
by: Shi, Weikang, et al.
Published: (2025)
by: Shi, Weikang, et al.
Published: (2025)
PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection
by: Wang, Peiyao, et al.
Published: (2025)
by: Wang, Peiyao, et al.
Published: (2025)
ArgusCogito: Chain-of-Thought for Cross-Modal Synergy and Omnidirectional Reasoning in Camouflaged Object Segmentation
by: Tan, Jianwen, et al.
Published: (2025)
by: Tan, Jianwen, et al.
Published: (2025)
Chain-of-Cooking:Cooking Process Visualization via Bidirectional Chain-of-Thought Guidance
by: Xu, Mengling, et al.
Published: (2025)
by: Xu, Mengling, et al.
Published: (2025)
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
by: Wang, Yibin, et al.
Published: (2025)
by: Wang, Yibin, et al.
Published: (2025)
StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA
by: Hu, Yuhang, et al.
Published: (2025)
by: Hu, Yuhang, et al.
Published: (2025)
TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis
by: Li, Sijing, et al.
Published: (2026)
by: Li, Sijing, et al.
Published: (2026)
VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG
by: Fu, Honghao, et al.
Published: (2026)
by: Fu, Honghao, et al.
Published: (2026)
CoTasks: Chain-of-Thought based Video Instruction Tuning Tasks
by: Wang, Yanan, et al.
Published: (2025)
by: Wang, Yanan, et al.
Published: (2025)
Generative Visual Chain-of-Thought for Image Editing
by: Yin, Zijin, et al.
Published: (2026)
by: Yin, Zijin, et al.
Published: (2026)
Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
by: Gao, Timin, et al.
Published: (2024)
by: Gao, Timin, et al.
Published: (2024)
VideoTIR: Accurate Understanding for Long Videos with Efficient Tool-Integrated Reasoning
by: Gao, Zhe, et al.
Published: (2026)
by: Gao, Zhe, et al.
Published: (2026)
Video-CoE: Reinforcing Video Event Prediction via Chain of Events
by: Su, Qile, et al.
Published: (2026)
by: Su, Qile, et al.
Published: (2026)
MedCoT: Medical Chain of Thought via Hierarchical Expert
by: Liu, Jiaxiang, et al.
Published: (2024)
by: Liu, Jiaxiang, et al.
Published: (2024)
CoS: Chain-of-Shot Prompting for Long Video Understanding
by: Hu, Jian, et al.
Published: (2025)
by: Hu, Jian, et al.
Published: (2025)
Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner
by: Chen, Lei, et al.
Published: (2025)
by: Chen, Lei, et al.
Published: (2025)
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
by: Han, Songhao, et al.
Published: (2024)
by: Han, Songhao, et al.
Published: (2024)
Similar Items
-
$Δ$ynamics: Language-Based Representation for Inferring Rigid-Body Dynamics From Videos
by: Kao, Chia-Hsiang, et al.
Published: (2026) -
SVQA-R1: Reinforcing Spatial Reasoning in MLLMs via View-Consistent Reward Optimization
by: Wang, Peiyao, et al.
Published: (2025) -
Efficient Temporal Action Segmentation via Boundary-aware Query Voting
by: Wang, Peiyao, et al.
Published: (2024) -
RCoT-Seg: Reinforced Chain-of-Thought for Video Reasoning and Segmentation
by: Wen, Junwei, et al.
Published: (2026) -
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding
by: Fu, Xingyu, et al.
Published: (2025)