:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhu, Hanlin, Hao, Shibo, Hu, Zhiting, Jiao, Jiantao, Russell, Stuart, Tian, Yuandong
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2509.23365
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
by: Zhu, Hanlin, et al.
Published: (2025)

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
by: Zhu, Hanlin, et al.
Published: (2024)

Transformers Provably Learn to Internalize Chain-of-Thought
by: Huang, Yixiao, et al.
Published: (2026)

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
by: Su, DiJia, et al.
Published: (2025)

Efficient Prompt Caching via Embedding Similarity
by: Zhu, Hanlin, et al.
Published: (2024)

Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking
by: Tian, Yuandong
Published: (2025)

GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments
by: Zhu, Hanlin, et al.
Published: (2025)

Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
by: Huang, Yixiao, et al.
Published: (2025)

On Representation Complexity of Model-based and Model-free Reinforcement Learning
by: Zhu, Hanlin, et al.
Published: (2023)

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
by: Hao, Shibo, et al.
Published: (2023)

Avoiding Catastrophe in Online Learning by Asking for Help
by: Plaut, Benjamin, et al.
Published: (2024)

Safe Learning Under Irreversible Dynamics via Asking for Help
by: Plaut, Benjamin, et al.
Published: (2025)

Training Large Language Models to Reason in a Continuous Latent Space
by: Hao, Shibo, et al.
Published: (2024)

LLM Pretraining with Continuous Concepts
by: Tack, Jihoon, et al.
Published: (2025)

Deep Thinking by Markov Chain of Continuous Thoughts
by: Liu, Jiayu, et al.
Published: (2025)

Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought
by: Xie, Zixuan, et al.
Published: (2026)

Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods
by: Hu, Xinyang, et al.
Published: (2024)

Unveiling Confirmation Bias in Chain-of-Thought Reasoning
by: Wan, Yue, et al.
Published: (2025)

Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
by: Rashidinejad, Paria, et al.
Published: (2024)

Emergence of Frontier Superposition: Möbius attractor and Cascade Supervision
by: Gu, Hongyu, et al.
Published: (2026)

Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
by: Tian, Yuandong
Published: (2024)

Constraint-Rectified Training for Efficient Chain-of-Thought
by: Wu, Qinhang, et al.
Published: (2026)

Continuous Chain of Thought Enables Parallel Exploration and Reasoning
by: Gozeten, Halil Alperen, et al.
Published: (2025)

Towards Optimal Statistical Watermarking
by: Huang, Baihe, et al.
Published: (2023)

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
by: Zhu, Banghua, et al.
Published: (2024)

Chain-of-Thought Predictive Control
by: Jia, Zhiwei, et al.
Published: (2023)

Understanding the Effect of Noise in LLM Training Data with Algorithmic Chains of Thought
by: Havrilla, Alex, et al.
Published: (2024)

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
by: Zhao, Jiawei, et al.
Published: (2024)

Toward a Theory of Tokenization in LLMs
by: Rajaraman, Nived, et al.
Published: (2024)

dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning
by: Chen, Shirui, et al.
Published: (2025)

Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons
by: Zhu, Banghua, et al.
Published: (2023)

Robust Fully-Asynchronous Methods for Distributed Training over General Architecture
by: Zhu, Zehan, et al.
Published: (2023)

Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning
by: Wang, Libo
Published: (2025)

RedCoast: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs
by: Tan, Bowen, et al.
Published: (2023)

Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression
by: Tang, Yuntian, et al.
Published: (2026)

GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection
by: Su, DiJia, et al.
Published: (2025)

Faithfulness as Information Flow: Evaluating and Training Faithful Chain-of-Thought Reasoning
by: Jia, Jinghan, et al.
Published: (2026)

SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
by: Zhou, Yifei, et al.
Published: (2025)

Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
by: Cao, Sheng, et al.
Published: (2025)

Bridging Formal Language with Chain-of-Thought Reasoning to Geometry Problem Solving
by: Yang, Tianyun, et al.
Published: (2025)