:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Silvestri, Gianluigi, Cetin, Edoardo
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.13274
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Reinforcing Chain-of-Thought Reasoning with Self-Evolving Rubrics
by: Sheng, Leheng, et al.
Published: (2026)

Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation
by: Kim, Juno, et al.
Published: (2025)

Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO
by: Yu, Bowen, et al.
Published: (2026)

Transformer-Squared: Self-adaptive LLMs
by: Sun, Qi, et al.
Published: (2025)

Scalable Chain of Thoughts via Elastic Reasoning
by: Xu, Yuhui, et al.
Published: (2025)

The Kinetics of Reasoning: How Chain-of-Thought Shapes Learning in Transformers?
by: Pengmei, Zihan, et al.
Published: (2025)

Fractured Chain-of-Thought Reasoning
by: Liao, Baohao, et al.
Published: (2025)

Improving Chain-of-Thought for Logical Reasoning via Attention-Aware Intervention
by: Phuong, Nguyen Minh, et al.
Published: (2026)

Unveiling Confirmation Bias in Chain-of-Thought Reasoning
by: Wan, Yue, et al.
Published: (2025)

Reinforcement Learning Teachers of Test Time Scaling
by: Cetin, Edoardo, et al.
Published: (2025)

Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning
by: Wang, Libo
Published: (2025)

Verifying Chain-of-Thought Reasoning via Its Computational Graph
by: Zhao, Zheng, et al.
Published: (2025)

Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models
by: Ye, Jiacheng, et al.
Published: (2024)

Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization
by: Huang, Yu, et al.
Published: (2025)

LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning
by: Motwani, Sumeet Ramesh, et al.
Published: (2026)

Understanding Reasoning in Chain-of-Thought from the Hopfieldian View
by: Hu, Lijie, et al.
Published: (2024)

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
by: Arcuschin, Iván, et al.
Published: (2025)

Compositional Reasoning with Transformers, RNNs, and Chain of Thought
by: Yehudai, Gilad, et al.
Published: (2025)

Long Chain-of-Thought Reasoning Across Languages
by: Barua, Josh, et al.
Published: (2025)

Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts
by: Ahmed, Ammar, et al.
Published: (2025)

In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought
by: Huang, Sili, et al.
Published: (2024)

Drop the Act: Probe-Filtered RL for Faithful Chain-of-Thought Reasoning
by: Parekh, Swapnil
Published: (2026)

Reinforcement Learning via Self-Distillation
by: Hübotter, Jonas, et al.
Published: (2026)

Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective
by: Kajitsuka, Tokio, et al.
Published: (2026)

CoTox: Chain-of-Thought-Based Molecular Toxicity Reasoning and Prediction
by: Park, Jueon, et al.
Published: (2025)

Re-FORC: Adaptive Reward Prediction for Efficient Chain-of-Thought Reasoning
by: Zabounidis, Renos, et al.
Published: (2025)

Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic
by: Mao, Zhenjiang, et al.
Published: (2025)

Think When You Need: Self-Adaptive Chain-of-Thought Learning
by: Yang, Junjie, et al.
Published: (2025)

GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning
by: Yerramilli, Sahiti, et al.
Published: (2025)

Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought
by: Boppana, Siddharth, et al.
Published: (2026)

Stepwise Penalization for Length-Efficient Chain-of-Thought Reasoning
by: Li, Xintong, et al.
Published: (2026)

Value-Guided Search for Efficient Chain-of-Thought Reasoning
by: Wang, Kaiwen, et al.
Published: (2025)

Enhancing Generalization in Chain of Thought Reasoning for Smaller Models
by: Yin, Maxwell J., et al.
Published: (2025)

TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning
by: Nagle, Alliot, et al.
Published: (2026)

FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning
by: Xie, Zhuohan, et al.
Published: (2025)

Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models
by: Chen, Changyu, et al.
Published: (2024)

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
by: Yao, Jiarui, et al.
Published: (2025)

Large Language Models to Diffusion Finetuning
by: Cetin, Edoardo, et al.
Published: (2025)

Simple Ingredients for Offline Reinforcement Learning
by: Cetin, Edoardo, et al.
Published: (2024)

AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning
by: Lou, Chenwei, et al.
Published: (2025)