:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Wolinski, Pierre
Format:	Preprint
Published:	2023
Subjects:	Machine Learning Optimization and Control
Online Access:	https://arxiv.org/abs/2312.03885
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exploiting Exogenous Structure for Sample-Efficient Reinforcement Learning
by: Wan, Jia, et al.
Published: (2024)

Higher-Order Newton Methods with Polynomial Work per Iteration
by: Ahmadi, Amir Ali, et al.
Published: (2023)

Higher Order Reduced Rank Regression
by: Greenberg, Leia, et al.
Published: (2025)

MARS: Unleashing the Power of Variance Reduction for Training Large Models
by: Yuan, Huizhuo, et al.
Published: (2024)

Exploiting weight-space symmetries for approximating curvature
by: Artemev, Artem, et al.
Published: (2026)

An Overview of Low-Rank Structures in the Training and Adaptation of Large Models
by: Balzano, Laura, et al.
Published: (2025)

LoCo: Low-Bit Communication Adaptor for Large-scale Model Training
by: Xie, Xingyu, et al.
Published: (2024)

GNMR: Runtime Stability Control for Low-Precision Large Language Model Training
by: Kong, Boao, et al.
Published: (2026)

Exploiting Similarity for Computation and Communication-Efficient Decentralized Optimization
by: Takezawa, Yuki, et al.
Published: (2025)

Better LMO-based Momentum Methods with Second-Order Information
by: Khirirat, Sarit, et al.
Published: (2025)

AdaFisher: Adaptive Second Order Optimization via Fisher Information
by: Gomes, Damien Martins, et al.
Published: (2024)

The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training
by: Schaipp, Fabian, et al.
Published: (2025)

Riemannian Zeroth-Order Gradient Estimation with Structure-Preserving Metrics for Geodesically Incomplete Manifolds
by: Ma, Shaocong, et al.
Published: (2026)

StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models
by: Yu, Dingzhi, et al.
Published: (2026)

LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
by: Refael, Yehonathan, et al.
Published: (2025)

Higher-Order Group Synchronization
by: Duncan, Adriana L., et al.
Published: (2025)

Clipping Improves Adam-Norm and AdaGrad-Norm when the Noise Is Heavy-Tailed
by: Chezhegov, Savelii, et al.
Published: (2024)

Towards Practical Second-Order Optimizers in Deep Learning: Insights from Fisher Information Analysis
by: Gomes, Damien Martins
Published: (2025)

sparseGeoHOPCA: A Geometric Solution to Sparse Higher-Order PCA Without Covariance Estimation
by: Xu, Renjie, et al.
Published: (2025)

Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization
by: Sahinoglu, Emre, et al.
Published: (2024)

On Adaptivity in Zeroth-Order Optimization
by: Dbouk, Hassan, et al.
Published: (2026)

Relaxation-Informed Training of Neural Network Surrogate Models
by: Tsay, Calvin
Published: (2026)

Training Deep Learning Models with Norm-Constrained LMOs
by: Pethick, Thomas, et al.
Published: (2025)

Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Training Structured Neural Network
by: Huang, Zih-Syuan, et al.
Published: (2024)

Optimal and Order-optimal Gated Priority-based Greedy Policies for Two-layer Multi-item Order Fulfillment
by: Chen, Xi, et al.
Published: (2026)

Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
by: Lau, Tim Tsz-Kit, et al.
Published: (2024)

Spherical Harmonic Optimal Transport: Application to Climate Models Comparisons
by: Houédry, Pierre, et al.
Published: (2026)

Estimating Higher-Order Mixed Memberships via the $\ell_{2,\infty}$ Tensor Perturbation Bound
by: Agterberg, Joshua, et al.
Published: (2022)

GradPower: Powering Gradients for Faster Language Model Pre-Training
by: Wang, Jinbo, et al.
Published: (2025)

A Scalable Factorization Approach for High-Order Structured Tensor Recovery
by: Qin, Zhen, et al.
Published: (2025)

Fully First-Order Algorithms for Online Bilevel Optimization
by: Jia, Tingkai, et al.
Published: (2026)

On the Complexity of First-Order Methods in Stochastic Bilevel Optimization
by: Kwon, Jeongyeol, et al.
Published: (2024)

A Split-Client Approach to Second-Order Optimization
by: Chayti, El Mahdi, et al.
Published: (2025)

On the Inherent Privacy of Zeroth Order Projected Gradient Descent
by: Gupta, Devansh, et al.
Published: (2025)

First-Order Methods for Linearly Constrained Bilevel Optimization
by: Kornowski, Guy, et al.
Published: (2024)

A Study of Condition Numbers for First-Order Optimization
by: Guille-Escuret, Charles, et al.
Published: (2020)

DOGE-Train: Discrete Optimization on GPU with End-to-end Training
by: Abbas, Ahmed, et al.
Published: (2022)

Exploiting inter-agent coupling information for efficient reinforcement learning of cooperative LQR
by: Syed, Shahbaz P Qadri, et al.
Published: (2025)

A Second-Order Majorant Algorithm for Nonnegative Matrix Factorization
by: Pham, Mai-Quyen, et al.
Published: (2023)

Zeroth-Order Methods for Stochastic Nonconvex Nonsmooth Composite Optimization
by: Chen, Ziyi, et al.
Published: (2025)