Saved in:
| Main Authors: | Silvestri, Gianluigi, Cetin, Edoardo |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.13274 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reinforcing Chain-of-Thought Reasoning with Self-Evolving Rubrics
by: Sheng, Leheng, et al.
Published: (2026)
by: Sheng, Leheng, et al.
Published: (2026)
Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation
by: Kim, Juno, et al.
Published: (2025)
by: Kim, Juno, et al.
Published: (2025)
Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO
by: Yu, Bowen, et al.
Published: (2026)
by: Yu, Bowen, et al.
Published: (2026)
Transformer-Squared: Self-adaptive LLMs
by: Sun, Qi, et al.
Published: (2025)
by: Sun, Qi, et al.
Published: (2025)
Scalable Chain of Thoughts via Elastic Reasoning
by: Xu, Yuhui, et al.
Published: (2025)
by: Xu, Yuhui, et al.
Published: (2025)
The Kinetics of Reasoning: How Chain-of-Thought Shapes Learning in Transformers?
by: Pengmei, Zihan, et al.
Published: (2025)
by: Pengmei, Zihan, et al.
Published: (2025)
Fractured Chain-of-Thought Reasoning
by: Liao, Baohao, et al.
Published: (2025)
by: Liao, Baohao, et al.
Published: (2025)
Improving Chain-of-Thought for Logical Reasoning via Attention-Aware Intervention
by: Phuong, Nguyen Minh, et al.
Published: (2026)
by: Phuong, Nguyen Minh, et al.
Published: (2026)
Unveiling Confirmation Bias in Chain-of-Thought Reasoning
by: Wan, Yue, et al.
Published: (2025)
by: Wan, Yue, et al.
Published: (2025)
Reinforcement Learning Teachers of Test Time Scaling
by: Cetin, Edoardo, et al.
Published: (2025)
by: Cetin, Edoardo, et al.
Published: (2025)
Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning
by: Wang, Libo
Published: (2025)
by: Wang, Libo
Published: (2025)
Verifying Chain-of-Thought Reasoning via Its Computational Graph
by: Zhao, Zheng, et al.
Published: (2025)
by: Zhao, Zheng, et al.
Published: (2025)
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models
by: Ye, Jiacheng, et al.
Published: (2024)
by: Ye, Jiacheng, et al.
Published: (2024)
Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization
by: Huang, Yu, et al.
Published: (2025)
by: Huang, Yu, et al.
Published: (2025)
LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning
by: Motwani, Sumeet Ramesh, et al.
Published: (2026)
by: Motwani, Sumeet Ramesh, et al.
Published: (2026)
Understanding Reasoning in Chain-of-Thought from the Hopfieldian View
by: Hu, Lijie, et al.
Published: (2024)
by: Hu, Lijie, et al.
Published: (2024)
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
by: Arcuschin, Iván, et al.
Published: (2025)
by: Arcuschin, Iván, et al.
Published: (2025)
Compositional Reasoning with Transformers, RNNs, and Chain of Thought
by: Yehudai, Gilad, et al.
Published: (2025)
by: Yehudai, Gilad, et al.
Published: (2025)
Long Chain-of-Thought Reasoning Across Languages
by: Barua, Josh, et al.
Published: (2025)
by: Barua, Josh, et al.
Published: (2025)
Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts
by: Ahmed, Ammar, et al.
Published: (2025)
by: Ahmed, Ammar, et al.
Published: (2025)
In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought
by: Huang, Sili, et al.
Published: (2024)
by: Huang, Sili, et al.
Published: (2024)
Drop the Act: Probe-Filtered RL for Faithful Chain-of-Thought Reasoning
by: Parekh, Swapnil
Published: (2026)
by: Parekh, Swapnil
Published: (2026)
Reinforcement Learning via Self-Distillation
by: Hübotter, Jonas, et al.
Published: (2026)
by: Hübotter, Jonas, et al.
Published: (2026)
Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective
by: Kajitsuka, Tokio, et al.
Published: (2026)
by: Kajitsuka, Tokio, et al.
Published: (2026)
CoTox: Chain-of-Thought-Based Molecular Toxicity Reasoning and Prediction
by: Park, Jueon, et al.
Published: (2025)
by: Park, Jueon, et al.
Published: (2025)
Re-FORC: Adaptive Reward Prediction for Efficient Chain-of-Thought Reasoning
by: Zabounidis, Renos, et al.
Published: (2025)
by: Zabounidis, Renos, et al.
Published: (2025)
Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic
by: Mao, Zhenjiang, et al.
Published: (2025)
by: Mao, Zhenjiang, et al.
Published: (2025)
Think When You Need: Self-Adaptive Chain-of-Thought Learning
by: Yang, Junjie, et al.
Published: (2025)
by: Yang, Junjie, et al.
Published: (2025)
GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning
by: Yerramilli, Sahiti, et al.
Published: (2025)
by: Yerramilli, Sahiti, et al.
Published: (2025)
Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought
by: Boppana, Siddharth, et al.
Published: (2026)
by: Boppana, Siddharth, et al.
Published: (2026)
Stepwise Penalization for Length-Efficient Chain-of-Thought Reasoning
by: Li, Xintong, et al.
Published: (2026)
by: Li, Xintong, et al.
Published: (2026)
Value-Guided Search for Efficient Chain-of-Thought Reasoning
by: Wang, Kaiwen, et al.
Published: (2025)
by: Wang, Kaiwen, et al.
Published: (2025)
Enhancing Generalization in Chain of Thought Reasoning for Smaller Models
by: Yin, Maxwell J., et al.
Published: (2025)
by: Yin, Maxwell J., et al.
Published: (2025)
TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning
by: Nagle, Alliot, et al.
Published: (2026)
by: Nagle, Alliot, et al.
Published: (2026)
FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning
by: Xie, Zhuohan, et al.
Published: (2025)
by: Xie, Zhuohan, et al.
Published: (2025)
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models
by: Chen, Changyu, et al.
Published: (2024)
by: Chen, Changyu, et al.
Published: (2024)
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
by: Yao, Jiarui, et al.
Published: (2025)
by: Yao, Jiarui, et al.
Published: (2025)
Large Language Models to Diffusion Finetuning
by: Cetin, Edoardo, et al.
Published: (2025)
by: Cetin, Edoardo, et al.
Published: (2025)
Simple Ingredients for Offline Reinforcement Learning
by: Cetin, Edoardo, et al.
Published: (2024)
by: Cetin, Edoardo, et al.
Published: (2024)
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning
by: Lou, Chenwei, et al.
Published: (2025)
by: Lou, Chenwei, et al.
Published: (2025)
Similar Items
-
Reinforcing Chain-of-Thought Reasoning with Self-Evolving Rubrics
by: Sheng, Leheng, et al.
Published: (2026) -
Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation
by: Kim, Juno, et al.
Published: (2025) -
Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO
by: Yu, Bowen, et al.
Published: (2026) -
Transformer-Squared: Self-adaptive LLMs
by: Sun, Qi, et al.
Published: (2025) -
Scalable Chain of Thoughts via Elastic Reasoning
by: Xu, Yuhui, et al.
Published: (2025)