:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Morwani, Depen, Vyas, Nikhil, Zhang, Hanlin, Kakade, Sham
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.02431
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton
by: Abreu, Natalie, et al.
Published: (2025)

Deconstructing What Makes a Good Optimizer for Language Models
by: Zhao, Rosie, et al.
Published: (2024)

Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight Averaging
by: Meterez, Alexandru, et al.
Published: (2026)

How Does Critical Batch Size Scale in Pre-training?
by: Zhang, Hanlin, et al.
Published: (2024)

Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
by: Vyas, Nikhil, et al.
Published: (2023)

Seesaw: Accelerating Training by Balancing Learning Rate and Batch Size Scheduling
by: Meterez, Alexandru, et al.
Published: (2025)

SOAP: Improving and Stabilizing Shampoo using Adam
by: Vyas, Nikhil, et al.
Published: (2024)

Adam or Gauss-Newton? A Comparative Study In Terms of Basis Alignment and SGD Noise
by: Liu, Bingbin, et al.
Published: (2025)

The AdEMAMix Optimizer: Better, Faster, Older
by: Pagliardini, Matteo, et al.
Published: (2024)

A New Perspective on Shampoo's Preconditioner
by: Morwani, Depen, et al.
Published: (2024)

A Simplified Analysis of SGD for Linear Regression with Weight Averaging
by: Meterez, Alexandru, et al.
Published: (2025)

LOTION: Smoothing the Optimization Landscape for Quantized Training
by: Kwun, Mujin, et al.
Published: (2025)

Loss-to-Loss Prediction: Scaling Laws for All Datasets
by: Brandfonbrener, David, et al.
Published: (2024)

Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning
by: Jin, Jikai, et al.
Published: (2025)

Prescriptive Scaling Reveals the Evolution of Language Model Capabilities
by: Zhang, Hanlin, et al.
Published: (2026)

The Recurrent Transformer: Greater Effective Depth and Efficient Decoding
by: Oncescu, Costin-Andrei, et al.
Published: (2026)

Feature emergence via margin maximization: case studies in algebraic tasks
by: Morwani, Depen, et al.
Published: (2023)

CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training
by: Brandfonbrener, David, et al.
Published: (2024)

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
by: Qi, Zhenting, et al.
Published: (2024)

Learning Hidden Markov Models Using Conditional Samples
by: Kakade, Sham M., et al.
Published: (2023)

Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond
by: Oncescu, Costin-Andrei, et al.
Published: (2024)

A Study on the Calibration of In-context Learning
by: Zhang, Hanlin, et al.
Published: (2023)

EvoLM: In Search of Lost Language Model Training Dynamics
by: Qi, Zhenting, et al.
Published: (2025)

Repeat After Me: Transformers are Better than State Space Models at Copying
by: Jelassi, Samy, et al.
Published: (2024)

Soup to go: mitigating forgetting during continual learning with model averaging
by: Kleiman, Anat, et al.
Published: (2025)

Scaling Laws for Imitation Learning in Single-Agent Games
by: Tuyls, Jens, et al.
Published: (2023)

The Role of Sparsity for Length Generalization in Transformers
by: Golowich, Noah, et al.
Published: (2025)

Scaling Laws in Linear Regression: Compute, Parameters, and Data
by: Lin, Licong, et al.
Published: (2024)

Transcendence: Generative Models Can Outperform The Experts That Train Them
by: Zhang, Edwin, et al.
Published: (2024)

Analysis of Schedule-Free Nonconvex Optimization
by: Brown, Connor
Published: (2025)

Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques
by: Zhang, Natalia, et al.
Published: (2024)

ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models
by: Defazio, Aaron
Published: (2026)

Anytime Training with Schedule-Free Spectral Optimization
by: Apte, Anuj, et al.
Published: (2026)

Peer-Predictive Self-Training for Language Model Reasoning
by: Feng, Shi, et al.
Published: (2026)

Accumulative SGD Influence Estimation for Data Attribution
by: Shi, Yunxiao, et al.
Published: (2025)

Deep Learning with Tabular Data: A Self-supervised Approach
by: Vyas, Tirth Kiranbhai
Published: (2024)

Anon: Extrapolating Adaptivity Beyond SGD and Adam
by: Zhang, Yiheng, et al.
Published: (2026)

Distinguishing the Knowable from the Unknowable with Language Models
by: Ahdritz, Gustaf, et al.
Published: (2024)

Observation-Free Attacks on Online Learning to Rank
by: Chattopadhyay, Sameep, et al.
Published: (2025)

Bootstrap SGD: Algorithmic Stability and Robustness
by: Christmann, Andreas, et al.
Published: (2024)