Saved in:
| Main Authors: | Medvedev, Marko, Lyu, Kaifeng, Yu, Dingli, Arora, Sanjeev, Li, Zhiyuan, Srebro, Nathan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.02877 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Shift is Good: Mismatched Data Mixing Improves Test Performance
by: Medvedev, Marko, et al.
Published: (2025)
by: Medvedev, Marko, et al.
Published: (2025)
Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality
by: Medvedev, Marko, et al.
Published: (2024)
by: Medvedev, Marko, et al.
Published: (2024)
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
by: Lyu, Kaifeng, et al.
Published: (2024)
by: Lyu, Kaifeng, et al.
Published: (2024)
Positive Distribution Shift as a Framework for Understanding Tractable Learning
by: Medvedev, Marko, et al.
Published: (2026)
by: Medvedev, Marko, et al.
Published: (2026)
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
by: Malladi, Sadhika, et al.
Published: (2022)
by: Malladi, Sadhika, et al.
Published: (2022)
Recursive Models for Long-Horizon Reasoning
by: Yang, Chenxiao, et al.
Published: (2026)
by: Yang, Chenxiao, et al.
Published: (2026)
AI-Assisted Generation of Difficult Math Questions
by: Shah, Vedant, et al.
Published: (2024)
by: Shah, Vedant, et al.
Published: (2024)
A Quadratic Synchronization Rule for Distributed Deep Learning
by: Gu, Xinran, et al.
Published: (2023)
by: Gu, Xinran, et al.
Published: (2023)
Can Models Learn Skill Composition from Examples?
by: Zhao, Haoyu, et al.
Published: (2024)
by: Zhao, Haoyu, et al.
Published: (2024)
From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning
by: Oh, Junsoo, et al.
Published: (2025)
by: Oh, Junsoo, et al.
Published: (2025)
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
by: Park, Simon, et al.
Published: (2025)
by: Park, Simon, et al.
Published: (2025)
Provable Tempered Overfitting of Minimal Nets and Typical Nets
by: Harel, Itamar, et al.
Published: (2024)
by: Harel, Itamar, et al.
Published: (2024)
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
by: Lyu, Kaifeng, et al.
Published: (2023)
by: Lyu, Kaifeng, et al.
Published: (2023)
Noisy Interpolation Learning with Shallow Univariate ReLU Networks
by: Joshi, Nirmit, et al.
Published: (2023)
by: Joshi, Nirmit, et al.
Published: (2023)
PENCIL: Long Thoughts with Short Memory
by: Yang, Chenxiao, et al.
Published: (2025)
by: Yang, Chenxiao, et al.
Published: (2025)
Quantifying Overfitting along the Regularization Path for Two-Part-Code MDL in Supervised Classification
by: Zhu, Xiaohan, et al.
Published: (2025)
by: Zhu, Xiaohan, et al.
Published: (2025)
Provable Weak-to-Strong Generalization via Benign Overfitting
by: Wu, David X., et al.
Published: (2024)
by: Wu, David X., et al.
Published: (2024)
Provable unlearning in topic modeling and downstream tasks
by: Wei, Stanley, et al.
Published: (2024)
by: Wei, Stanley, et al.
Published: (2024)
How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers
by: Buzaglo, Gon, et al.
Published: (2024)
by: Buzaglo, Gon, et al.
Published: (2024)
Overfitting and Generalizing with (PAC) Bayesian Prediction in Noisy Binary Classification
by: Zhu, Xiaohan, et al.
Published: (2026)
by: Zhu, Xiaohan, et al.
Published: (2026)
Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks
by: Li, Binghui, et al.
Published: (2024)
by: Li, Binghui, et al.
Published: (2024)
The Price of Implicit Bias in Adversarially Robust Generalization
by: Tsilivis, Nikolaos, et al.
Published: (2024)
by: Tsilivis, Nikolaos, et al.
Published: (2024)
Improved Scaling Laws via Weak-to-Strong Generalization in Random Feature Ridge Regression
by: Wu, Diyuan, et al.
Published: (2026)
by: Wu, Diyuan, et al.
Published: (2026)
Depth Separation in Norm-Bounded Infinite-Width Neural Networks
by: Parkinson, Suzanna, et al.
Published: (2024)
by: Parkinson, Suzanna, et al.
Published: (2024)
Strong and Weak Random Walks on Signed Networks
by: Babul, Shazia'Ayn, et al.
Published: (2024)
by: Babul, Shazia'Ayn, et al.
Published: (2024)
On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries
by: Joshi, Nirmit, et al.
Published: (2024)
by: Joshi, Nirmit, et al.
Published: (2024)
Tight Bounds on the Binomial CDF, and the Minimum of i.i.d Binomials, in terms of KL-Divergence
by: Zhu, Xiaohan, et al.
Published: (2025)
by: Zhu, Xiaohan, et al.
Published: (2025)
Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes
by: Harel, Itamar, et al.
Published: (2025)
by: Harel, Itamar, et al.
Published: (2025)
Research Program: Theory of Learning in Dynamical Systems
by: Hazan, Elad, et al.
Published: (2025)
by: Hazan, Elad, et al.
Published: (2025)
The Marginal Value of Momentum for Small Learning Rate SGD
by: Wang, Runzhe, et al.
Published: (2023)
by: Wang, Runzhe, et al.
Published: (2023)
A Theory of Learning with Autoregressive Chain of Thought
by: Joshi, Nirmit, et al.
Published: (2025)
by: Joshi, Nirmit, et al.
Published: (2025)
An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression
by: Zhou, Lijia, et al.
Published: (2023)
by: Zhou, Lijia, et al.
Published: (2023)
On the Hardness of Learning Regular Expressions
by: Attias, Idan, et al.
Published: (2025)
by: Attias, Idan, et al.
Published: (2025)
Learning single-index models via harmonic decomposition
by: Joshi, Nirmit, et al.
Published: (2025)
by: Joshi, Nirmit, et al.
Published: (2025)
Provable Benefits of Sinusoidal Activation for Modular Addition
by: Huang, Tianlong, et al.
Published: (2025)
by: Huang, Tianlong, et al.
Published: (2025)
The Mechanism of Weak-to-Strong Generalization: Feature Elicitation from Latent Knowledge
by: Awano, Ryoya, et al.
Published: (2026)
by: Awano, Ryoya, et al.
Published: (2026)
Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold
by: Li, Xinghan, et al.
Published: (2025)
by: Li, Xinghan, et al.
Published: (2025)
Mixture of Weak & Strong Experts on Graphs
by: Zeng, Hanqing, et al.
Published: (2023)
by: Zeng, Hanqing, et al.
Published: (2023)
On the Impossibility of Retrain Equivalence in Machine Unlearning
by: Yu, Jiatong, et al.
Published: (2025)
by: Yu, Jiatong, et al.
Published: (2025)
How Does RL Post-training Induce Skill Composition? A Case Study on Countdown
by: Park, Simon, et al.
Published: (2025)
by: Park, Simon, et al.
Published: (2025)
Similar Items
-
Shift is Good: Mismatched Data Mixing Improves Test Performance
by: Medvedev, Marko, et al.
Published: (2025) -
Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality
by: Medvedev, Marko, et al.
Published: (2024) -
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
by: Lyu, Kaifeng, et al.
Published: (2024) -
Positive Distribution Shift as a Framework for Understanding Tractable Learning
by: Medvedev, Marko, et al.
Published: (2026) -
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
by: Malladi, Sadhika, et al.
Published: (2022)