Saved in:
| Main Authors: | Nguyen, Son, Liu, Bo, Chen, Lizhang, Liu, Qiang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.07488 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Memory-Efficient Optimization with Factorized Hamiltonian Descent
by: Nguyen, Son, et al.
Published: (2024)
by: Nguyen, Son, et al.
Published: (2024)
Cautious Optimizers: Improving Training with One Line of Code
by: Liang, Kaizhao, et al.
Published: (2024)
by: Liang, Kaizhao, et al.
Published: (2024)
Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
by: Chen, Lizhang, et al.
Published: (2023)
by: Chen, Lizhang, et al.
Published: (2023)
Muon Optimizes Under Spectral Norm Constraints
by: Chen, Lizhang, et al.
Published: (2025)
by: Chen, Lizhang, et al.
Published: (2025)
Memory-Efficient LLM Training with Online Subspace Descent
by: Liang, Kaizhao, et al.
Published: (2024)
by: Liang, Kaizhao, et al.
Published: (2024)
DeMo: Decoupled Momentum Optimization
by: Peng, Bowen, et al.
Published: (2024)
by: Peng, Bowen, et al.
Published: (2024)
Training-Free Looped Transformers
by: Chen, Lizhang, et al.
Published: (2026)
by: Chen, Lizhang, et al.
Published: (2026)
Structured Preconditioners in Adaptive Optimization: A Unified Analysis
by: Xie, Shuo, et al.
Published: (2025)
by: Xie, Shuo, et al.
Published: (2025)
Adaptive Preconditioners Trigger Loss Spikes in Adam
by: Bai, Zhiwei, et al.
Published: (2025)
by: Bai, Zhiwei, et al.
Published: (2025)
Momentum Guidance: Plug-and-Play Guidance for Flow Models
by: Liao, Runlong, et al.
Published: (2026)
by: Liao, Runlong, et al.
Published: (2026)
Communication Efficient Distributed Training with Distributed Lion
by: Liu, Bo, et al.
Published: (2024)
by: Liu, Bo, et al.
Published: (2024)
Taming Preconditioner Drift: Unlocking the Potential of Second-Order Optimizers for Federated Learning on Non-IID Data
by: Liu, Junkang, et al.
Published: (2026)
by: Liu, Junkang, et al.
Published: (2026)
$ϕ$-Balancing for Mixture-of-Experts Training
by: Chen, Lizhang, et al.
Published: (2026)
by: Chen, Lizhang, et al.
Published: (2026)
AdaFlow: Imitation Learning with Variance-Adaptive Flow-Based Policies
by: Hu, Xixi, et al.
Published: (2024)
by: Hu, Xixi, et al.
Published: (2024)
Cautious Weight Decay
by: Chen, Lizhang, et al.
Published: (2025)
by: Chen, Lizhang, et al.
Published: (2025)
Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning
by: Liu, Ziyue, et al.
Published: (2026)
by: Liu, Ziyue, et al.
Published: (2026)
SAMix: Calibrated and Accurate Continual Learning via Sphere-Adaptive Mixup and Neural Collapse
by: Dang, Trung-Anh, et al.
Published: (2025)
by: Dang, Trung-Anh, et al.
Published: (2025)
Curvature-Informed SGD via General Purpose Lie-Group Preconditioners
by: Pooladzandi, Omead, et al.
Published: (2024)
by: Pooladzandi, Omead, et al.
Published: (2024)
CUPID in the Model Zoo: Online Matchmaking for Selecting Your Dream LLM
by: Nguyen, Son, et al.
Published: (2026)
by: Nguyen, Son, et al.
Published: (2026)
Diagonally-Weighted Generalized Method of Moments Estimation for Gaussian Mixture Modeling
by: Zhang, Liu, et al.
Published: (2025)
by: Zhang, Liu, et al.
Published: (2025)
Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems
by: Chen, Jie
Published: (2024)
by: Chen, Jie
Published: (2024)
Learning Sparse Approximate Inverse Preconditioners for Conjugate Gradient Solvers on GPUs
by: Yang, Zherui, et al.
Published: (2025)
by: Yang, Zherui, et al.
Published: (2025)
A New Perspective on Shampoo's Preconditioner
by: Morwani, Depen, et al.
Published: (2024)
by: Morwani, Depen, et al.
Published: (2024)
Gaussian Processes Sampling with Sparse Grids under Additive Schwarz Preconditioner
by: Chen, Haoyuan, et al.
Published: (2024)
by: Chen, Haoyuan, et al.
Published: (2024)
Improving Probabilistic Diffusion Models With Optimal Diagonal Covariance Matching
by: Ou, Zijing, et al.
Published: (2024)
by: Ou, Zijing, et al.
Published: (2024)
Adam Improves Muon: Adaptive Moment Estimation with Orthogonalized Momentum
by: Zhang, Minxin, et al.
Published: (2026)
by: Zhang, Minxin, et al.
Published: (2026)
Regime-Adaptive Bayesian Optimization via Dirichlet Process Mixtures of Gaussian Processes
by: Zhang, Yan, et al.
Published: (2026)
by: Zhang, Yan, et al.
Published: (2026)
A Self-Attentive Meta-Optimizer with Group-Adaptive Learning Rates and Weight Decay
by: Zhao, JiangBo, et al.
Published: (2026)
by: Zhao, JiangBo, et al.
Published: (2026)
An Experimental Study of Semantic Continuity for Deep Learning Models
by: Wu, Shangxi, et al.
Published: (2020)
by: Wu, Shangxi, et al.
Published: (2020)
Adaptive Estimation and Inference in Conditional Moment Models via the Discrepancy Principle
by: Tan, Jiyuan, et al.
Published: (2026)
by: Tan, Jiyuan, et al.
Published: (2026)
Spectral Embeddings Leak Graph Topology: Theory, Benchmark, and Adaptive Reconstruction
by: Nguyen-Cong, Thinh, et al.
Published: (2026)
by: Nguyen-Cong, Thinh, et al.
Published: (2026)
Diagonal Adaptive Non-local Observables on Quantum Neural Networks
by: Tseng, Huan-Hsin, et al.
Published: (2026)
by: Tseng, Huan-Hsin, et al.
Published: (2026)
Optimization Insights into Deep Diagonal Linear Networks
by: Labarrière, Hippolyte, et al.
Published: (2024)
by: Labarrière, Hippolyte, et al.
Published: (2024)
Generative modeling of Sparse Approximate Inverse Preconditioners
by: Li, Mou, et al.
Published: (2024)
by: Li, Mou, et al.
Published: (2024)
Diagonal Over-parameterization in Reproducing Kernel Hilbert Spaces as an Adaptive Feature Model: Generalization and Adaptivity
by: Li, Yicheng, et al.
Published: (2025)
by: Li, Yicheng, et al.
Published: (2025)
Preconditioners for the Stochastic Training of Neural Fields
by: Chng, Shin-Fang, et al.
Published: (2024)
by: Chng, Shin-Fang, et al.
Published: (2024)
Improving Rectified Flow with Boundary Conditions
by: Hu, Xixi, et al.
Published: (2025)
by: Hu, Xixi, et al.
Published: (2025)
Improving Deep Knowledge Tracing via Gated Architectures and Adaptive Optimization
by: Shukurlu, Altun
Published: (2025)
by: Shukurlu, Altun
Published: (2025)
Adaptive Moment Estimation Optimization Algorithm Using Projection Gradient for Deep Learning
by: Li, Yongqi, et al.
Published: (2025)
by: Li, Yongqi, et al.
Published: (2025)
GADPN: Graph Adaptive Denoising and Perturbation Networks via Singular Value Decomposition
by: Deng, Hao, et al.
Published: (2026)
by: Deng, Hao, et al.
Published: (2026)
Similar Items
-
Memory-Efficient Optimization with Factorized Hamiltonian Descent
by: Nguyen, Son, et al.
Published: (2024) -
Cautious Optimizers: Improving Training with One Line of Code
by: Liang, Kaizhao, et al.
Published: (2024) -
Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
by: Chen, Lizhang, et al.
Published: (2023) -
Muon Optimizes Under Spectral Norm Constraints
by: Chen, Lizhang, et al.
Published: (2025) -
Memory-Efficient LLM Training with Online Subspace Descent
by: Liang, Kaizhao, et al.
Published: (2024)