:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cattaneo, Matias D., Shigida, Boris
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence Optimization and Control Computation
Online Access:	https://arxiv.org/abs/2602.01642
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

On the Implicit Bias of Adam
by: Cattaneo, Matias D., et al.
Published: (2023)

How Memory in Optimization Algorithms Implicitly Modifies the Loss
by: Cattaneo, Matias D., et al.
Published: (2025)

Modified Loss of Momentum Gradient Descent: Fine-Grained Analysis
by: Cattaneo, Matias D., et al.
Published: (2025)

Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
by: Baek, Beomhan, et al.
Published: (2025)

The Rich and the Simple: On the Implicit Bias of Adam and SGD
by: Vasudeva, Bhavya, et al.
Published: (2025)

A Rod Flow Model for Adam at the Edge of Stability
by: Regis, Eric, et al.
Published: (2026)

AdamZ: An Enhanced Optimisation Method for Neural Network Training
by: Zaznov, Ilia, et al.
Published: (2024)

Optimizer-Induced Mode Connectivity: From AdamW to Muon
by: Zhang, Fangzhao, et al.
Published: (2026)

Muon Outperforms Adam in Tail-End Associative Memory Learning
by: Wang, Shuche, et al.
Published: (2025)

How Does Critical Batch Size Scale in Pre-training?
by: Zhang, Hanlin, et al.
Published: (2024)

Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization
by: Xie, Shuo, et al.
Published: (2024)

Seesaw: Accelerating Training by Balancing Learning Rate and Batch Size Scheduling
by: Meterez, Alexandru, et al.
Published: (2025)

Self-Certifying Primal-Dual Optimization Proxies for Large-Scale Batch Economic Dispatch
by: Klamkin, Michael, et al.
Published: (2025)

The Implicit Curriculum: Learning Dynamics in RL with Verifiable Rewards
by: Huang, Yu, et al.
Published: (2026)

Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
by: Sheen, Heejune, et al.
Published: (2024)

Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients
by: Alvo, Matias, et al.
Published: (2026)

Asymptotic and Finite Sample Analysis of Nonexpansive Stochastic Approximations with Markovian Noise
by: Blaser, Ethan, et al.
Published: (2024)

Correlated Noise Provably Beats Independent Noise for Differentially Private Learning
by: Choquette-Choo, Christopher A., et al.
Published: (2023)

Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error
by: Farnia, Farzan, et al.
Published: (2026)

Nonlinear Non-Gaussian Density Steering with Input and Noise Channel Mismatch: Sinkhorn with Memory for Solving the Control-affine Schrödinger Bridge Problem
by: Bondar, Georgiy A., et al.
Published: (2026)

Faster Stochastic Optimization with Arbitrary Delays via Asynchronous Mini-Batching
by: Attia, Amit, et al.
Published: (2024)

DT-PBO: an Interpretable Tree-based Surrogate Model for Preferential Bayesian Optimization
by: Leenders, Nick, et al.
Published: (2025)

Reward Collapse in Aligning Large Language Models
by: Song, Ziang, et al.
Published: (2023)

Stronger Approximation Guarantees for Non-Monotone γ-Weakly DR-Submodular Maximization
by: Jadav, Hareshkumar, et al.
Published: (2026)

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds
by: Tao, Hongyi, et al.
Published: (2026)

Understanding Forgetting in LLM Supervised Fine-Tuning and Preference Learning -- A Convex Optimization Perspective
by: Fernando, Heshan, et al.
Published: (2024)

Solving General Natural-Language-Description Optimization Problems with Large Language Models
by: Zhang, Jihai, et al.
Published: (2024)

One-Shot Safety Alignment for Large Language Models via Optimal Dualization
by: Huang, Xinmeng, et al.
Published: (2024)

Causal LLM Routing: End-to-End Regret Minimization from Observational Data
by: Tsiourvas, Asterios, et al.
Published: (2025)

Reinforcement Learning from Human Feedback with Active Queries
by: Ji, Kaixuan, et al.
Published: (2024)

Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
by: Gautam, Tanmay, et al.
Published: (2024)

Algorithmic Challenges in Ensuring Fairness at the Time of Decision
by: Salem, Jad, et al.
Published: (2021)

Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers
by: Chen, Siyu, et al.
Published: (2024)

Variational Learning is Effective for Large Deep Networks
by: Shen, Yuesong, et al.
Published: (2024)

DiaBlo: Diagonal Blocks Are Sufficient For Finetuning
by: Gurses, Selcuk, et al.
Published: (2025)

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
by: Pan, Rui, et al.
Published: (2024)

Transformers as Support Vector Machines
by: Tarzanagh, Davoud Ataee, et al.
Published: (2023)

Explicit and data-Efficient Encoding via Gradient Flow
by: Flouris, Kyriakos, et al.
Published: (2024)

Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
by: Li, Yingcong, et al.
Published: (2025)

When and How Unlabeled Data Provably Improve In-Context Learning
by: Li, Yingcong, et al.
Published: (2025)