Saved in:
| Main Authors: | Chen, Thomas, Ma, Tengyu, Li, Zhiyuan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.03085 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
by: Li, Zhiyuan, et al.
Published: (2024)
by: Li, Zhiyuan, et al.
Published: (2024)
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
by: Liu, Hong, et al.
Published: (2023)
by: Liu, Hong, et al.
Published: (2023)
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective
by: Wen, Kaiyue, et al.
Published: (2024)
by: Wen, Kaiyue, et al.
Published: (2024)
Linguistic Calibration of Long-Form Generations
by: Band, Neil, et al.
Published: (2024)
by: Band, Neil, et al.
Published: (2024)
Configuration-to-Performance Scaling Law with Neural Ansatz
by: Zhang, Huaqing, et al.
Published: (2026)
by: Zhang, Huaqing, et al.
Published: (2026)
Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically
by: Dong, Kefan, et al.
Published: (2024)
by: Dong, Kefan, et al.
Published: (2024)
Divide-and-Conquer CoT: RL for Reducing Latency via Parallel Reasoning
by: Mahankali, Arvind, et al.
Published: (2026)
by: Mahankali, Arvind, et al.
Published: (2026)
STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving
by: Dong, Kefan, et al.
Published: (2025)
by: Dong, Kefan, et al.
Published: (2025)
Fantastic Pretraining Optimizers and Where to Find Them
by: Wen, Kaiyue, et al.
Published: (2025)
by: Wen, Kaiyue, et al.
Published: (2025)
A Theoretical Framework for Self-Play Theorem Proving Algorithms
by: Chen, Thomas, et al.
Published: (2026)
by: Chen, Thomas, et al.
Published: (2026)
Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative Models
by: Li, Gen, et al.
Published: (2023)
by: Li, Gen, et al.
Published: (2023)
Scaling Self-Play with Self-Guidance
by: Bailey, Luke, et al.
Published: (2026)
by: Bailey, Luke, et al.
Published: (2026)
Large Language Models as Tool Makers
by: Cai, Tianle, et al.
Published: (2023)
by: Cai, Tianle, et al.
Published: (2023)
Flight Trajectory Prediction Using an Enhanced CNN-LSTM Network
by: Hao, Qinzhi, et al.
Published: (2024)
by: Hao, Qinzhi, et al.
Published: (2024)
Fighter flight trajectory prediction based on spatio-temporal graphcial attention network
by: Sun, Yao, et al.
Published: (2024)
by: Sun, Yao, et al.
Published: (2024)
Pseudo-Formalization for Automatic Proof Verification
by: Barkallah, Slim, et al.
Published: (2026)
by: Barkallah, Slim, et al.
Published: (2026)
Looped Transformers for Length Generalization
by: Fan, Ying, et al.
Published: (2024)
by: Fan, Ying, et al.
Published: (2024)
Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization
by: Chen, Yang, et al.
Published: (2024)
by: Chen, Yang, et al.
Published: (2024)
On Vanishing Variance in Transformer Length Generalization
by: Li, Ruining, et al.
Published: (2025)
by: Li, Ruining, et al.
Published: (2025)
Non-Asymptotic Analysis of (Sticky) Track-and-Stop
by: Poiani, Riccardo, et al.
Published: (2025)
by: Poiani, Riccardo, et al.
Published: (2025)
Non-Asymptotic Analysis of Efficiency in Conformalized Regression
by: Yao, Yunzhen, et al.
Published: (2025)
by: Yao, Yunzhen, et al.
Published: (2025)
Non-Asymptotic Convergence of Stochastic Iterative Algorithms: A Lyapunov Framework
by: Chen, Zaiwei, et al.
Published: (2026)
by: Chen, Zaiwei, et al.
Published: (2026)
Improving Variable-Length Generation in Diffusion Language Models via Length Regularization
by: Cheng, Zicong, et al.
Published: (2026)
by: Cheng, Zicong, et al.
Published: (2026)
On the Limitations and Capabilities of Position Embeddings for Length Generalization
by: Chen, Yang, et al.
Published: (2025)
by: Chen, Yang, et al.
Published: (2025)
Mamba Modulation: On the Length Generalization of Mamba
by: Lu, Peng, et al.
Published: (2025)
by: Lu, Peng, et al.
Published: (2025)
Asymptotics of Non-Convex Generalized Linear Models in High-Dimensions: A proof of the replica formula
by: Vilucchio, Matteo, et al.
Published: (2025)
by: Vilucchio, Matteo, et al.
Published: (2025)
Comparative Study on Semi-supervised Learning Applied for Anomaly Detection in Hydraulic Condition Monitoring System
by: Dong, Yongqi, et al.
Published: (2023)
by: Dong, Yongqi, et al.
Published: (2023)
Quantitative Bounds for Length Generalization in Transformers
by: Izzo, Zachary, et al.
Published: (2025)
by: Izzo, Zachary, et al.
Published: (2025)
Universal Length Generalization with Turing Programs
by: Hou, Kaiying, et al.
Published: (2024)
by: Hou, Kaiying, et al.
Published: (2024)
Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models
by: Cayci, Semih
Published: (2025)
by: Cayci, Semih
Published: (2025)
A Tale of Two Geometries: Adaptive Optimizers and Non-Euclidean Descent
by: Xie, Shuo, et al.
Published: (2025)
by: Xie, Shuo, et al.
Published: (2025)
Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count
by: Cho, Hanseul, et al.
Published: (2024)
by: Cho, Hanseul, et al.
Published: (2024)
On Provable Length and Compositional Generalization
by: Ahuja, Kartik, et al.
Published: (2024)
by: Ahuja, Kartik, et al.
Published: (2024)
Understanding and Improving Length Generalization in Recurrent Models
by: Ruiz, Ricardo Buitrago, et al.
Published: (2025)
by: Ruiz, Ricardo Buitrago, et al.
Published: (2025)
Learning Variable-Length Tokenization for Generative Recommendation
by: Wang, Minhao, et al.
Published: (2026)
by: Wang, Minhao, et al.
Published: (2026)
Length Generalization with Log-Depth Recurrent Units
by: Pert, Charles, et al.
Published: (2026)
by: Pert, Charles, et al.
Published: (2026)
Provably Minimum-Length Conformal Prediction Sets for Ordinal Classification
by: Zhang, Zijian, et al.
Published: (2025)
by: Zhang, Zijian, et al.
Published: (2025)
Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers
by: Shaw, Peter, et al.
Published: (2025)
by: Shaw, Peter, et al.
Published: (2025)
Non-Asymptotic Global Convergence of PPO-Clip
by: Liu, Yin, et al.
Published: (2025)
by: Liu, Yin, et al.
Published: (2025)
No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference
by: Mani, Pranav, et al.
Published: (2025)
by: Mani, Pranav, et al.
Published: (2025)
Similar Items
-
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
by: Li, Zhiyuan, et al.
Published: (2024) -
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
by: Liu, Hong, et al.
Published: (2023) -
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective
by: Wen, Kaiyue, et al.
Published: (2024) -
Linguistic Calibration of Long-Form Generations
by: Band, Neil, et al.
Published: (2024) -
Configuration-to-Performance Scaling Law with Neural Ansatz
by: Zhang, Huaqing, et al.
Published: (2026)