Saved in:
| Main Authors: | Zhu, Hanlin, Hao, Shibo, Hu, Zhiting, Jiao, Jiantao, Russell, Stuart, Tian, Yuandong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.23365 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
by: Zhu, Hanlin, et al.
Published: (2025)
by: Zhu, Hanlin, et al.
Published: (2025)
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
by: Zhu, Hanlin, et al.
Published: (2024)
by: Zhu, Hanlin, et al.
Published: (2024)
Transformers Provably Learn to Internalize Chain-of-Thought
by: Huang, Yixiao, et al.
Published: (2026)
by: Huang, Yixiao, et al.
Published: (2026)
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
by: Su, DiJia, et al.
Published: (2025)
by: Su, DiJia, et al.
Published: (2025)
Efficient Prompt Caching via Embedding Similarity
by: Zhu, Hanlin, et al.
Published: (2024)
by: Zhu, Hanlin, et al.
Published: (2024)
Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking
by: Tian, Yuandong
Published: (2025)
by: Tian, Yuandong
Published: (2025)
GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments
by: Zhu, Hanlin, et al.
Published: (2025)
by: Zhu, Hanlin, et al.
Published: (2025)
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
by: Huang, Yixiao, et al.
Published: (2025)
by: Huang, Yixiao, et al.
Published: (2025)
On Representation Complexity of Model-based and Model-free Reinforcement Learning
by: Zhu, Hanlin, et al.
Published: (2023)
by: Zhu, Hanlin, et al.
Published: (2023)
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
by: Hao, Shibo, et al.
Published: (2023)
by: Hao, Shibo, et al.
Published: (2023)
Avoiding Catastrophe in Online Learning by Asking for Help
by: Plaut, Benjamin, et al.
Published: (2024)
by: Plaut, Benjamin, et al.
Published: (2024)
Safe Learning Under Irreversible Dynamics via Asking for Help
by: Plaut, Benjamin, et al.
Published: (2025)
by: Plaut, Benjamin, et al.
Published: (2025)
Training Large Language Models to Reason in a Continuous Latent Space
by: Hao, Shibo, et al.
Published: (2024)
by: Hao, Shibo, et al.
Published: (2024)
LLM Pretraining with Continuous Concepts
by: Tack, Jihoon, et al.
Published: (2025)
by: Tack, Jihoon, et al.
Published: (2025)
Deep Thinking by Markov Chain of Continuous Thoughts
by: Liu, Jiayu, et al.
Published: (2025)
by: Liu, Jiayu, et al.
Published: (2025)
Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought
by: Xie, Zixuan, et al.
Published: (2026)
by: Xie, Zixuan, et al.
Published: (2026)
Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods
by: Hu, Xinyang, et al.
Published: (2024)
by: Hu, Xinyang, et al.
Published: (2024)
Unveiling Confirmation Bias in Chain-of-Thought Reasoning
by: Wan, Yue, et al.
Published: (2025)
by: Wan, Yue, et al.
Published: (2025)
Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
by: Rashidinejad, Paria, et al.
Published: (2024)
by: Rashidinejad, Paria, et al.
Published: (2024)
Emergence of Frontier Superposition: Möbius attractor and Cascade Supervision
by: Gu, Hongyu, et al.
Published: (2026)
by: Gu, Hongyu, et al.
Published: (2026)
Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
by: Tian, Yuandong
Published: (2024)
by: Tian, Yuandong
Published: (2024)
Constraint-Rectified Training for Efficient Chain-of-Thought
by: Wu, Qinhang, et al.
Published: (2026)
by: Wu, Qinhang, et al.
Published: (2026)
Continuous Chain of Thought Enables Parallel Exploration and Reasoning
by: Gozeten, Halil Alperen, et al.
Published: (2025)
by: Gozeten, Halil Alperen, et al.
Published: (2025)
Towards Optimal Statistical Watermarking
by: Huang, Baihe, et al.
Published: (2023)
by: Huang, Baihe, et al.
Published: (2023)
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
by: Zhu, Banghua, et al.
Published: (2024)
by: Zhu, Banghua, et al.
Published: (2024)
Chain-of-Thought Predictive Control
by: Jia, Zhiwei, et al.
Published: (2023)
by: Jia, Zhiwei, et al.
Published: (2023)
Understanding the Effect of Noise in LLM Training Data with Algorithmic Chains of Thought
by: Havrilla, Alex, et al.
Published: (2024)
by: Havrilla, Alex, et al.
Published: (2024)
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
by: Zhao, Jiawei, et al.
Published: (2024)
by: Zhao, Jiawei, et al.
Published: (2024)
Toward a Theory of Tokenization in LLMs
by: Rajaraman, Nived, et al.
Published: (2024)
by: Rajaraman, Nived, et al.
Published: (2024)
dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning
by: Chen, Shirui, et al.
Published: (2025)
by: Chen, Shirui, et al.
Published: (2025)
Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons
by: Zhu, Banghua, et al.
Published: (2023)
by: Zhu, Banghua, et al.
Published: (2023)
Robust Fully-Asynchronous Methods for Distributed Training over General Architecture
by: Zhu, Zehan, et al.
Published: (2023)
by: Zhu, Zehan, et al.
Published: (2023)
Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning
by: Wang, Libo
Published: (2025)
by: Wang, Libo
Published: (2025)
RedCoast: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs
by: Tan, Bowen, et al.
Published: (2023)
by: Tan, Bowen, et al.
Published: (2023)
Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression
by: Tang, Yuntian, et al.
Published: (2026)
by: Tang, Yuntian, et al.
Published: (2026)
GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection
by: Su, DiJia, et al.
Published: (2025)
by: Su, DiJia, et al.
Published: (2025)
Faithfulness as Information Flow: Evaluating and Training Faithful Chain-of-Thought Reasoning
by: Jia, Jinghan, et al.
Published: (2026)
by: Jia, Jinghan, et al.
Published: (2026)
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
by: Zhou, Yifei, et al.
Published: (2025)
by: Zhou, Yifei, et al.
Published: (2025)
Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
by: Cao, Sheng, et al.
Published: (2025)
by: Cao, Sheng, et al.
Published: (2025)
Bridging Formal Language with Chain-of-Thought Reasoning to Geometry Problem Solving
by: Yang, Tianyun, et al.
Published: (2025)
by: Yang, Tianyun, et al.
Published: (2025)
Similar Items
-
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
by: Zhu, Hanlin, et al.
Published: (2025) -
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
by: Zhu, Hanlin, et al.
Published: (2024) -
Transformers Provably Learn to Internalize Chain-of-Thought
by: Huang, Yixiao, et al.
Published: (2026) -
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
by: Su, DiJia, et al.
Published: (2025) -
Efficient Prompt Caching via Embedding Similarity
by: Zhu, Hanlin, et al.
Published: (2024)