:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Lizhang, Li, Jonathan, Wang, Qi, Liao, Runlong, Li, Shuozhe, Liang, Chen, Lao, Ni, Liu, Qiang
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Optimization and Control
Online Access:	https://arxiv.org/abs/2605.15403
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Muon Optimizes Under Spectral Norm Constraints
by: Chen, Lizhang, et al.
Published: (2025)

Cautious Weight Decay
by: Chen, Lizhang, et al.
Published: (2025)

Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
by: Chen, Lizhang, et al.
Published: (2023)

Communication Efficient Distributed Training with Distributed Lion
by: Liu, Bo, et al.
Published: (2024)

Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts
by: Liao, Fangshuo, et al.
Published: (2025)

Training-Free Looped Transformers
by: Chen, Lizhang, et al.
Published: (2026)

Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach
by: Wang, Renzi, et al.
Published: (2024)

A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models
by: Han, X. Y., et al.
Published: (2025)

Hierarchical Mixture-of-Experts with Two-Stage Optimization
by: Molodtsov, Gleb, et al.
Published: (2026)

A Relaxed Wasserstein Distance Formulation for Mixtures of Radially Contoured Distributions
by: Chen, Keyu, et al.
Published: (2025)

Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings
by: Farhat, Yehya, et al.
Published: (2023)

Diffusion Model for Data-Driven Black-Box Optimization
by: Li, Zihao, et al.
Published: (2024)

AutoBalance: An Automatic Balancing Framework for Training Physics-Informed Neural Networks
by: An, Kang, et al.
Published: (2025)

Gradient descent in matrix factorization: Understanding large initialization
by: Chen, Hengchao, et al.
Published: (2023)

An inexact Bregman proximal point method and its acceleration version for unbalanced optimal transport
by: Chen, Xiang, et al.
Published: (2024)

Muon in Associative Memory Learning: Training Dynamics and Scaling Laws
by: Li, Binghui, et al.
Published: (2026)

Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control: Non-Penalty Approach
by: Feng, Lechen, et al.
Published: (2025)

Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control: Penalty Approach
by: Feng, Lechen, et al.
Published: (2025)

Nonsmooth Nonconvex-Nonconcave Minimax Optimization: Primal-Dual Balancing and Iteration Complexity Analysis
by: Li, Jiajin, et al.
Published: (2022)

Feed m Birds with One Scone: Accelerating Multi-task Gradient Balancing via Bi-level Optimization
by: Chen, Xuxing, et al.
Published: (2026)

Asynchronous and Stochastic Distributed Resource Allocation
by: Li, Qiang, et al.
Published: (2025)

MultiBalance: Multi-Objective Gradient Balancing in Industrial-Scale Multi-Task Recommendation System
by: He, Yun, et al.
Published: (2024)

Solving Sparse \& High-Dimensional-Output Regression via Compression
by: Li, Renyuan, et al.
Published: (2024)

In-memory Training on Analog Devices with Limited Conductance States via Multi-tile Residual Learning
by: Li, Jindan, et al.
Published: (2025)

Proximal Oracles for Optimization and Sampling
by: Liang, Jiaming, et al.
Published: (2024)

Neural Network Training Techniques Regularize Optimization Trajectory: An Empirical Study
by: Chen, Cheng, et al.
Published: (2020)

Seesaw: Accelerating Training by Balancing Learning Rate and Batch Size Scheduling
by: Meterez, Alexandru, et al.
Published: (2025)

Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
by: Lau, Tim Tsz-Kit, et al.
Published: (2024)

Natural Geometry of Robust Data Attribution: From Convex Models to Deep Networks
by: Li, Shihao, et al.
Published: (2025)

Policy Mirror Descent with Temporal Difference Learning: Sample Complexity under Online Markov Data
by: Li, Wenye, et al.
Published: (2025)

Inertial Quadratic Majorization Minimization with Application to Kernel Regularized Learning
by: Heng, Qiang, et al.
Published: (2025)

Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks
by: Yang, Yahong, et al.
Published: (2023)

Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks
by: Xu, Xianliang, et al.
Published: (2024)

ADMM Algorithms for Residual Network Training: Convergence Analysis and Parallel Implementation
by: Xu, Jintao, et al.
Published: (2023)

GNMR: Runtime Stability Control for Low-Precision Large Language Model Training
by: Kong, Boao, et al.
Published: (2026)

Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory
by: Zhang, Yufeng, et al.
Published: (2020)

Quantum Learning and Estimation for Coordinated Operation between Distribution Networks and Energy Communities
by: Zhuang, Yingrui, et al.
Published: (2025)

Adaptive Federated Minimax Optimization with Lower Complexities
by: Huang, Feihu, et al.
Published: (2022)

A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization
by: Luo, Yudong, et al.
Published: (2024)

Principled Bayesian Optimisation in Collaboration with Human Experts
by: Xu, Wenjie, et al.
Published: (2024)