:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Medvedev, Marko, Lyu, Kaifeng, Yu, Dingli, Arora, Sanjeev, Li, Zhiyuan, Srebro, Nathan
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2503.02877
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Shift is Good: Mismatched Data Mixing Improves Test Performance
by: Medvedev, Marko, et al.
Published: (2025)

Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality
by: Medvedev, Marko, et al.
Published: (2024)

Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
by: Lyu, Kaifeng, et al.
Published: (2024)

Positive Distribution Shift as a Framework for Understanding Tractable Learning
by: Medvedev, Marko, et al.
Published: (2026)

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
by: Malladi, Sadhika, et al.
Published: (2022)

Recursive Models for Long-Horizon Reasoning
by: Yang, Chenxiao, et al.
Published: (2026)

AI-Assisted Generation of Difficult Math Questions
by: Shah, Vedant, et al.
Published: (2024)

A Quadratic Synchronization Rule for Distributed Deep Learning
by: Gu, Xinran, et al.
Published: (2023)

Can Models Learn Skill Composition from Examples?
by: Zhao, Haoyu, et al.
Published: (2024)

From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning
by: Oh, Junsoo, et al.
Published: (2025)

Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
by: Park, Simon, et al.
Published: (2025)

Provable Tempered Overfitting of Minimal Nets and Typical Nets
by: Harel, Itamar, et al.
Published: (2024)

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
by: Lyu, Kaifeng, et al.
Published: (2023)

Noisy Interpolation Learning with Shallow Univariate ReLU Networks
by: Joshi, Nirmit, et al.
Published: (2023)

PENCIL: Long Thoughts with Short Memory
by: Yang, Chenxiao, et al.
Published: (2025)

Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification
by: Zhu, Xiaohan, et al.
Published: (2025)

Provable Weak-to-Strong Generalization via Benign Overfitting
by: Wu, David X., et al.
Published: (2024)

Provable unlearning in topic modeling and downstream tasks
by: Wei, Stanley, et al.
Published: (2024)

How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers
by: Buzaglo, Gon, et al.
Published: (2024)

Overfitting and Generalizing with (PAC) Bayesian Prediction in Noisy Binary Classification
by: Zhu, Xiaohan, et al.
Published: (2026)

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks
by: Li, Binghui, et al.
Published: (2024)

The Price of Implicit Bias in Adversarially Robust Generalization
by: Tsilivis, Nikolaos, et al.
Published: (2024)

Improved Scaling Laws via Weak-to-Strong Generalization in Random Feature Ridge Regression
by: Wu, Diyuan, et al.
Published: (2026)

Depth Separation in Norm-Bounded Infinite-Width Neural Networks
by: Parkinson, Suzanna, et al.
Published: (2024)

Strong and Weak Random Walks on Signed Networks
by: Babul, Shazia'Ayn, et al.
Published: (2024)

On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries
by: Joshi, Nirmit, et al.
Published: (2024)

Tight Bounds on the Binomial CDF, and the Minimum of i.i.d Binomials, in terms of KL-Divergence
by: Zhu, Xiaohan, et al.
Published: (2025)

Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes
by: Harel, Itamar, et al.
Published: (2025)

Research Program: Theory of Learning in Dynamical Systems
by: Hazan, Elad, et al.
Published: (2025)

The Marginal Value of Momentum for Small Learning Rate SGD
by: Wang, Runzhe, et al.
Published: (2023)

A Theory of Learning with Autoregressive Chain of Thought
by: Joshi, Nirmit, et al.
Published: (2025)

An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression
by: Zhou, Lijia, et al.
Published: (2023)

On the Hardness of Learning Regular Expressions
by: Attias, Idan, et al.
Published: (2025)

Learning single-index models via harmonic decomposition
by: Joshi, Nirmit, et al.
Published: (2025)

Provable Benefits of Sinusoidal Activation for Modular Addition
by: Huang, Tianlong, et al.
Published: (2025)

The Mechanism of Weak-to-Strong Generalization: Feature Elicitation from Latent Knowledge
by: Awano, Ryoya, et al.
Published: (2026)

Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold
by: Li, Xinghan, et al.
Published: (2025)

Mixture of Weak & Strong Experts on Graphs
by: Zeng, Hanqing, et al.
Published: (2023)

On the Impossibility of Retrain Equivalence in Machine Unlearning
by: Yu, Jiatong, et al.
Published: (2025)

How Does RL Post-training Induce Skill Composition? A Case Study on Countdown
by: Park, Simon, et al.
Published: (2025)