Saved in:
| Main Author: | Wolinski, Pierre |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2312.03885 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exploiting Exogenous Structure for Sample-Efficient Reinforcement Learning
by: Wan, Jia, et al.
Published: (2024)
by: Wan, Jia, et al.
Published: (2024)
Higher-Order Newton Methods with Polynomial Work per Iteration
by: Ahmadi, Amir Ali, et al.
Published: (2023)
by: Ahmadi, Amir Ali, et al.
Published: (2023)
Higher Order Reduced Rank Regression
by: Greenberg, Leia, et al.
Published: (2025)
by: Greenberg, Leia, et al.
Published: (2025)
MARS: Unleashing the Power of Variance Reduction for Training Large Models
by: Yuan, Huizhuo, et al.
Published: (2024)
by: Yuan, Huizhuo, et al.
Published: (2024)
Exploiting weight-space symmetries for approximating curvature
by: Artemev, Artem, et al.
Published: (2026)
by: Artemev, Artem, et al.
Published: (2026)
An Overview of Low-Rank Structures in the Training and Adaptation of Large Models
by: Balzano, Laura, et al.
Published: (2025)
by: Balzano, Laura, et al.
Published: (2025)
LoCo: Low-Bit Communication Adaptor for Large-scale Model Training
by: Xie, Xingyu, et al.
Published: (2024)
by: Xie, Xingyu, et al.
Published: (2024)
GNMR: Runtime Stability Control for Low-Precision Large Language Model Training
by: Kong, Boao, et al.
Published: (2026)
by: Kong, Boao, et al.
Published: (2026)
Exploiting Similarity for Computation and Communication-Efficient Decentralized Optimization
by: Takezawa, Yuki, et al.
Published: (2025)
by: Takezawa, Yuki, et al.
Published: (2025)
Better LMO-based Momentum Methods with Second-Order Information
by: Khirirat, Sarit, et al.
Published: (2025)
by: Khirirat, Sarit, et al.
Published: (2025)
AdaFisher: Adaptive Second Order Optimization via Fisher Information
by: Gomes, Damien Martins, et al.
Published: (2024)
by: Gomes, Damien Martins, et al.
Published: (2024)
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training
by: Schaipp, Fabian, et al.
Published: (2025)
by: Schaipp, Fabian, et al.
Published: (2025)
Riemannian Zeroth-Order Gradient Estimation with Structure-Preserving Metrics for Geodesically Incomplete Manifolds
by: Ma, Shaocong, et al.
Published: (2026)
by: Ma, Shaocong, et al.
Published: (2026)
StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models
by: Yu, Dingzhi, et al.
Published: (2026)
by: Yu, Dingzhi, et al.
Published: (2026)
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
by: Refael, Yehonathan, et al.
Published: (2025)
by: Refael, Yehonathan, et al.
Published: (2025)
Higher-Order Group Synchronization
by: Duncan, Adriana L., et al.
Published: (2025)
by: Duncan, Adriana L., et al.
Published: (2025)
Clipping Improves Adam-Norm and AdaGrad-Norm when the Noise Is Heavy-Tailed
by: Chezhegov, Savelii, et al.
Published: (2024)
by: Chezhegov, Savelii, et al.
Published: (2024)
Towards Practical Second-Order Optimizers in Deep Learning: Insights from Fisher Information Analysis
by: Gomes, Damien Martins
Published: (2025)
by: Gomes, Damien Martins
Published: (2025)
sparseGeoHOPCA: A Geometric Solution to Sparse Higher-Order PCA Without Covariance Estimation
by: Xu, Renjie, et al.
Published: (2025)
by: Xu, Renjie, et al.
Published: (2025)
Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization
by: Sahinoglu, Emre, et al.
Published: (2024)
by: Sahinoglu, Emre, et al.
Published: (2024)
On Adaptivity in Zeroth-Order Optimization
by: Dbouk, Hassan, et al.
Published: (2026)
by: Dbouk, Hassan, et al.
Published: (2026)
Relaxation-Informed Training of Neural Network Surrogate Models
by: Tsay, Calvin
Published: (2026)
by: Tsay, Calvin
Published: (2026)
Training Deep Learning Models with Norm-Constrained LMOs
by: Pethick, Thomas, et al.
Published: (2025)
by: Pethick, Thomas, et al.
Published: (2025)
Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Training Structured Neural Network
by: Huang, Zih-Syuan, et al.
Published: (2024)
by: Huang, Zih-Syuan, et al.
Published: (2024)
Optimal and Order-optimal Gated Priority-based Greedy Policies for Two-layer Multi-item Order Fulfillment
by: Chen, Xi, et al.
Published: (2026)
by: Chen, Xi, et al.
Published: (2026)
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
by: Lau, Tim Tsz-Kit, et al.
Published: (2024)
by: Lau, Tim Tsz-Kit, et al.
Published: (2024)
Spherical Harmonic Optimal Transport: Application to Climate Models Comparisons
by: Houédry, Pierre, et al.
Published: (2026)
by: Houédry, Pierre, et al.
Published: (2026)
Estimating Higher-Order Mixed Memberships via the $\ell_{2,\infty}$ Tensor Perturbation Bound
by: Agterberg, Joshua, et al.
Published: (2022)
by: Agterberg, Joshua, et al.
Published: (2022)
GradPower: Powering Gradients for Faster Language Model Pre-Training
by: Wang, Jinbo, et al.
Published: (2025)
by: Wang, Jinbo, et al.
Published: (2025)
A Scalable Factorization Approach for High-Order Structured Tensor Recovery
by: Qin, Zhen, et al.
Published: (2025)
by: Qin, Zhen, et al.
Published: (2025)
Fully First-Order Algorithms for Online Bilevel Optimization
by: Jia, Tingkai, et al.
Published: (2026)
by: Jia, Tingkai, et al.
Published: (2026)
On the Complexity of First-Order Methods in Stochastic Bilevel Optimization
by: Kwon, Jeongyeol, et al.
Published: (2024)
by: Kwon, Jeongyeol, et al.
Published: (2024)
A Split-Client Approach to Second-Order Optimization
by: Chayti, El Mahdi, et al.
Published: (2025)
by: Chayti, El Mahdi, et al.
Published: (2025)
On the Inherent Privacy of Zeroth Order Projected Gradient Descent
by: Gupta, Devansh, et al.
Published: (2025)
by: Gupta, Devansh, et al.
Published: (2025)
First-Order Methods for Linearly Constrained Bilevel Optimization
by: Kornowski, Guy, et al.
Published: (2024)
by: Kornowski, Guy, et al.
Published: (2024)
A Study of Condition Numbers for First-Order Optimization
by: Guille-Escuret, Charles, et al.
Published: (2020)
by: Guille-Escuret, Charles, et al.
Published: (2020)
DOGE-Train: Discrete Optimization on GPU with End-to-end Training
by: Abbas, Ahmed, et al.
Published: (2022)
by: Abbas, Ahmed, et al.
Published: (2022)
Exploiting inter-agent coupling information for efficient reinforcement learning of cooperative LQR
by: Syed, Shahbaz P Qadri, et al.
Published: (2025)
by: Syed, Shahbaz P Qadri, et al.
Published: (2025)
A Second-Order Majorant Algorithm for Nonnegative Matrix Factorization
by: Pham, Mai-Quyen, et al.
Published: (2023)
by: Pham, Mai-Quyen, et al.
Published: (2023)
Zeroth-Order Methods for Stochastic Nonconvex Nonsmooth Composite Optimization
by: Chen, Ziyi, et al.
Published: (2025)
by: Chen, Ziyi, et al.
Published: (2025)
Similar Items
-
Exploiting Exogenous Structure for Sample-Efficient Reinforcement Learning
by: Wan, Jia, et al.
Published: (2024) -
Higher-Order Newton Methods with Polynomial Work per Iteration
by: Ahmadi, Amir Ali, et al.
Published: (2023) -
Higher Order Reduced Rank Regression
by: Greenberg, Leia, et al.
Published: (2025) -
MARS: Unleashing the Power of Variance Reduction for Training Large Models
by: Yuan, Huizhuo, et al.
Published: (2024) -
Exploiting weight-space symmetries for approximating curvature
by: Artemev, Artem, et al.
Published: (2026)