:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Thomas, Ma, Tengyu, Li, Zhiyuan
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2506.03085
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
by: Li, Zhiyuan, et al.
Published: (2024)

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
by: Liu, Hong, et al.
Published: (2023)

Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective
by: Wen, Kaiyue, et al.
Published: (2024)

Linguistic Calibration of Long-Form Generations
by: Band, Neil, et al.
Published: (2024)

Configuration-to-Performance Scaling Law with Neural Ansatz
by: Zhang, Huaqing, et al.
Published: (2026)

Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically
by: Dong, Kefan, et al.
Published: (2024)

Divide-and-Conquer CoT: RL for Reducing Latency via Parallel Reasoning
by: Mahankali, Arvind, et al.
Published: (2026)

STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving
by: Dong, Kefan, et al.
Published: (2025)

Fantastic Pretraining Optimizers and Where to Find Them
by: Wen, Kaiyue, et al.
Published: (2025)

A Theoretical Framework for Self-Play Theorem Proving Algorithms
by: Chen, Thomas, et al.
Published: (2026)

Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative Models
by: Li, Gen, et al.
Published: (2023)

Scaling Self-Play with Self-Guidance
by: Bailey, Luke, et al.
Published: (2026)

Large Language Models as Tool Makers
by: Cai, Tianle, et al.
Published: (2023)

Flight Trajectory Prediction Using an Enhanced CNN-LSTM Network
by: Hao, Qinzhi, et al.
Published: (2024)

Fighter flight trajectory prediction based on spatio-temporal graphcial attention network
by: Sun, Yao, et al.
Published: (2024)

Pseudo-Formalization for Automatic Proof Verification
by: Barkallah, Slim, et al.
Published: (2026)

Looped Transformers for Length Generalization
by: Fan, Ying, et al.
Published: (2024)

Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization
by: Chen, Yang, et al.
Published: (2024)

On Vanishing Variance in Transformer Length Generalization
by: Li, Ruining, et al.
Published: (2025)

Non-Asymptotic Analysis of (Sticky) Track-and-Stop
by: Poiani, Riccardo, et al.
Published: (2025)

Non-Asymptotic Analysis of Efficiency in Conformalized Regression
by: Yao, Yunzhen, et al.
Published: (2025)

Non-Asymptotic Convergence of Stochastic Iterative Algorithms: A Lyapunov Framework
by: Chen, Zaiwei, et al.
Published: (2026)

Improving Variable-Length Generation in Diffusion Language Models via Length Regularization
by: Cheng, Zicong, et al.
Published: (2026)

On the Limitations and Capabilities of Position Embeddings for Length Generalization
by: Chen, Yang, et al.
Published: (2025)

Mamba Modulation: On the Length Generalization of Mamba
by: Lu, Peng, et al.
Published: (2025)

Asymptotics of Non-Convex Generalized Linear Models in High-Dimensions: A proof of the replica formula
by: Vilucchio, Matteo, et al.
Published: (2025)

Comparative Study on Semi-supervised Learning Applied for Anomaly Detection in Hydraulic Condition Monitoring System
by: Dong, Yongqi, et al.
Published: (2023)

Quantitative Bounds for Length Generalization in Transformers
by: Izzo, Zachary, et al.
Published: (2025)

Universal Length Generalization with Turing Programs
by: Hou, Kaiying, et al.
Published: (2024)

Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models
by: Cayci, Semih
Published: (2025)

A Tale of Two Geometries: Adaptive Optimizers and Non-Euclidean Descent
by: Xie, Shuo, et al.
Published: (2025)

Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count
by: Cho, Hanseul, et al.
Published: (2024)

On Provable Length and Compositional Generalization
by: Ahuja, Kartik, et al.
Published: (2024)

Understanding and Improving Length Generalization in Recurrent Models
by: Ruiz, Ricardo Buitrago, et al.
Published: (2025)

Learning Variable-Length Tokenization for Generative Recommendation
by: Wang, Minhao, et al.
Published: (2026)

Length Generalization with Log-Depth Recurrent Units
by: Pert, Charles, et al.
Published: (2026)

Provably Minimum-Length Conformal Prediction Sets for Ordinal Classification
by: Zhang, Zijian, et al.
Published: (2025)

Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers
by: Shaw, Peter, et al.
Published: (2025)

Non-Asymptotic Global Convergence of PPO-Clip
by: Liu, Yin, et al.
Published: (2025)

No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference
by: Mani, Pranav, et al.
Published: (2025)