Saved in:
| Main Authors: | Cattaneo, Matias D., Shigida, Boris |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.01642 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
On the Implicit Bias of Adam
by: Cattaneo, Matias D., et al.
Published: (2023)
by: Cattaneo, Matias D., et al.
Published: (2023)
How Memory in Optimization Algorithms Implicitly Modifies the Loss
by: Cattaneo, Matias D., et al.
Published: (2025)
by: Cattaneo, Matias D., et al.
Published: (2025)
Modified Loss of Momentum Gradient Descent: Fine-Grained Analysis
by: Cattaneo, Matias D., et al.
Published: (2025)
by: Cattaneo, Matias D., et al.
Published: (2025)
Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
by: Baek, Beomhan, et al.
Published: (2025)
by: Baek, Beomhan, et al.
Published: (2025)
The Rich and the Simple: On the Implicit Bias of Adam and SGD
by: Vasudeva, Bhavya, et al.
Published: (2025)
by: Vasudeva, Bhavya, et al.
Published: (2025)
A Rod Flow Model for Adam at the Edge of Stability
by: Regis, Eric, et al.
Published: (2026)
by: Regis, Eric, et al.
Published: (2026)
AdamZ: An Enhanced Optimisation Method for Neural Network Training
by: Zaznov, Ilia, et al.
Published: (2024)
by: Zaznov, Ilia, et al.
Published: (2024)
Optimizer-Induced Mode Connectivity: From AdamW to Muon
by: Zhang, Fangzhao, et al.
Published: (2026)
by: Zhang, Fangzhao, et al.
Published: (2026)
Muon Outperforms Adam in Tail-End Associative Memory Learning
by: Wang, Shuche, et al.
Published: (2025)
by: Wang, Shuche, et al.
Published: (2025)
How Does Critical Batch Size Scale in Pre-training?
by: Zhang, Hanlin, et al.
Published: (2024)
by: Zhang, Hanlin, et al.
Published: (2024)
Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization
by: Xie, Shuo, et al.
Published: (2024)
by: Xie, Shuo, et al.
Published: (2024)
Seesaw: Accelerating Training by Balancing Learning Rate and Batch Size Scheduling
by: Meterez, Alexandru, et al.
Published: (2025)
by: Meterez, Alexandru, et al.
Published: (2025)
Self-Certifying Primal-Dual Optimization Proxies for Large-Scale Batch Economic Dispatch
by: Klamkin, Michael, et al.
Published: (2025)
by: Klamkin, Michael, et al.
Published: (2025)
The Implicit Curriculum: Learning Dynamics in RL with Verifiable Rewards
by: Huang, Yu, et al.
Published: (2026)
by: Huang, Yu, et al.
Published: (2026)
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
by: Sheen, Heejune, et al.
Published: (2024)
by: Sheen, Heejune, et al.
Published: (2024)
Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients
by: Alvo, Matias, et al.
Published: (2026)
by: Alvo, Matias, et al.
Published: (2026)
Asymptotic and Finite Sample Analysis of Nonexpansive Stochastic Approximations with Markovian Noise
by: Blaser, Ethan, et al.
Published: (2024)
by: Blaser, Ethan, et al.
Published: (2024)
Correlated Noise Provably Beats Independent Noise for Differentially Private Learning
by: Choquette-Choo, Christopher A., et al.
Published: (2023)
by: Choquette-Choo, Christopher A., et al.
Published: (2023)
Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error
by: Farnia, Farzan, et al.
Published: (2026)
by: Farnia, Farzan, et al.
Published: (2026)
Nonlinear Non-Gaussian Density Steering with Input and Noise Channel Mismatch: Sinkhorn with Memory for Solving the Control-affine Schrödinger Bridge Problem
by: Bondar, Georgiy A., et al.
Published: (2026)
by: Bondar, Georgiy A., et al.
Published: (2026)
Faster Stochastic Optimization with Arbitrary Delays via Asynchronous Mini-Batching
by: Attia, Amit, et al.
Published: (2024)
by: Attia, Amit, et al.
Published: (2024)
DT-PBO: an Interpretable Tree-based Surrogate Model for Preferential Bayesian Optimization
by: Leenders, Nick, et al.
Published: (2025)
by: Leenders, Nick, et al.
Published: (2025)
Reward Collapse in Aligning Large Language Models
by: Song, Ziang, et al.
Published: (2023)
by: Song, Ziang, et al.
Published: (2023)
Stronger Approximation Guarantees for Non-Monotone γ-Weakly DR-Submodular Maximization
by: Jadav, Hareshkumar, et al.
Published: (2026)
by: Jadav, Hareshkumar, et al.
Published: (2026)
When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds
by: Tao, Hongyi, et al.
Published: (2026)
by: Tao, Hongyi, et al.
Published: (2026)
Understanding Forgetting in LLM Supervised Fine-Tuning and Preference Learning -- A Convex Optimization Perspective
by: Fernando, Heshan, et al.
Published: (2024)
by: Fernando, Heshan, et al.
Published: (2024)
Solving General Natural-Language-Description Optimization Problems with Large Language Models
by: Zhang, Jihai, et al.
Published: (2024)
by: Zhang, Jihai, et al.
Published: (2024)
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
by: Huang, Xinmeng, et al.
Published: (2024)
by: Huang, Xinmeng, et al.
Published: (2024)
Causal LLM Routing: End-to-End Regret Minimization from Observational Data
by: Tsiourvas, Asterios, et al.
Published: (2025)
by: Tsiourvas, Asterios, et al.
Published: (2025)
Reinforcement Learning from Human Feedback with Active Queries
by: Ji, Kaixuan, et al.
Published: (2024)
by: Ji, Kaixuan, et al.
Published: (2024)
Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
by: Gautam, Tanmay, et al.
Published: (2024)
by: Gautam, Tanmay, et al.
Published: (2024)
Algorithmic Challenges in Ensuring Fairness at the Time of Decision
by: Salem, Jad, et al.
Published: (2021)
by: Salem, Jad, et al.
Published: (2021)
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers
by: Chen, Siyu, et al.
Published: (2024)
by: Chen, Siyu, et al.
Published: (2024)
Variational Learning is Effective for Large Deep Networks
by: Shen, Yuesong, et al.
Published: (2024)
by: Shen, Yuesong, et al.
Published: (2024)
DiaBlo: Diagonal Blocks Are Sufficient For Finetuning
by: Gurses, Selcuk, et al.
Published: (2025)
by: Gurses, Selcuk, et al.
Published: (2025)
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
by: Pan, Rui, et al.
Published: (2024)
by: Pan, Rui, et al.
Published: (2024)
Transformers as Support Vector Machines
by: Tarzanagh, Davoud Ataee, et al.
Published: (2023)
by: Tarzanagh, Davoud Ataee, et al.
Published: (2023)
Explicit and data-Efficient Encoding via Gradient Flow
by: Flouris, Kyriakos, et al.
Published: (2024)
by: Flouris, Kyriakos, et al.
Published: (2024)
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
by: Li, Yingcong, et al.
Published: (2025)
by: Li, Yingcong, et al.
Published: (2025)
When and How Unlabeled Data Provably Improve In-Context Learning
by: Li, Yingcong, et al.
Published: (2025)
by: Li, Yingcong, et al.
Published: (2025)
Similar Items
-
On the Implicit Bias of Adam
by: Cattaneo, Matias D., et al.
Published: (2023) -
How Memory in Optimization Algorithms Implicitly Modifies the Loss
by: Cattaneo, Matias D., et al.
Published: (2025) -
Modified Loss of Momentum Gradient Descent: Fine-Grained Analysis
by: Cattaneo, Matias D., et al.
Published: (2025) -
Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
by: Baek, Beomhan, et al.
Published: (2025) -
The Rich and the Simple: On the Implicit Bias of Adam and SGD
by: Vasudeva, Bhavya, et al.
Published: (2025)