:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Modoranu, Ionut-Vlad, Zmushko, Philip, Schultheis, Erik, Safaryan, Mher, Alistarh, Dan
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.02016
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning
by: Modoranu, Ionut-Vlad, et al.
Published: (2026)

The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information
by: Wu, Diyuan, et al.
Published: (2024)

FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models
by: Modoranu, Ionut-Vlad, et al.
Published: (2025)

LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
by: Robert, Thomas, et al.
Published: (2024)

Unified Scaling Laws for Compressed Representations
by: Panferov, Andrei, et al.
Published: (2025)

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence
by: Modoranu, Ionut-Vlad, et al.
Published: (2024)

LoRDO: Distributed Low-Rank Optimization with Infrequent Communication
by: Jovanović, Andrej, et al.
Published: (2026)

Towards Robust Scaling Laws for Optimizers
by: Volkova, Alexandra, et al.
Published: (2026)

Error Feedback Can Accurately Compress Preconditioners
by: Modoranu, Ionut-Vlad, et al.
Published: (2023)

CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training
by: Tabesh, Soroush, et al.
Published: (2025)

LLMQ: Efficient Lower-Precision Pretraining for Consumer GPUs
by: Schultheis, Erik, et al.
Published: (2025)

Optimizers Qualitatively Alter Solutions And We Should Leverage This
by: Pascanu, Razvan, et al.
Published: (2025)

Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation
by: Panferov, Andrei, et al.
Published: (2026)

GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity
by: Maranjyan, Artavazd, et al.
Published: (2022)

On Biased Compression for Distributed Learning
by: Beznosikov, Aleksandr, et al.
Published: (2020)

Grid Games: The Power of Multiple Grids for Quantizing Large Language Models
by: Egiazarian, Vage, et al.
Published: (2026)

FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
by: Zmushko, Philip, et al.
Published: (2024)

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
by: Rodionov, Gleb, et al.
Published: (2025)

Model Compression with Exact Budget Constraints via Riemannian Manifolds
by: Helcig, Michael, et al.
Published: (2026)

Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence
by: Ansaripour, Matin, et al.
Published: (2022)

MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization
by: Kleinegger, Maximilian, et al.
Published: (2026)

Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-Sided and Two-Sided Preconditioning
by: Li, Huan, et al.
Published: (2026)

Behemoth: Benchmarking Unlearning in LLMs Using Fully Synthetic Data
by: Iofinova, Eugenia, et al.
Published: (2026)

Efficient Data Selection at Scale via Influence Distillation
by: Nikdan, Mahdi, et al.
Published: (2025)

Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner
by: Eschenhagen, Runa, et al.
Published: (2025)

Label Privacy in Split Learning for Large Models with Parameter-Efficient Training
by: Zmushko, Philip, et al.
Published: (2024)

Communication-Efficient Federated Learning With Data and Client Heterogeneity
by: Zakerinia, Hossein, et al.
Published: (2022)

RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation
by: Nikdan, Mahdi, et al.
Published: (2024)

Apertus LLM Family Expansion via Distillation and Quantization
by: Panferov, Andrei, et al.
Published: (2026)

4-bit Shampoo for Memory-Efficient Network Training
by: Wang, Sike, et al.
Published: (2024)

EvoPress: Accurate Dynamic Model Compression via Evolutionary Search
by: Sieberling, Oliver, et al.
Published: (2024)

SGD for Variational Inference: Tackling Unbounded Variance via Preconditioning and Dynamic Batching
by: Labarrière, Hippolyte, et al.
Published: (2026)

Statistically-Lossless Quantization of Large Language Models
by: Helcig, Michael, et al.
Published: (2026)

Powerset Convolutional Neural Networks
by: Wendler, Chris, et al.
Published: (2019)

Just a Simple Transformation is Enough for Data Protection in Vertical Federated Learning
by: Semenov, Andrei, et al.
Published: (2024)

Understanding and Improving Shampoo and SOAP via Kullback-Leibler Minimization
by: Lin, Wu, et al.
Published: (2025)

Solving Dense Linear Systems Faster Than via Preconditioning
by: Dereziński, Michał, et al.
Published: (2023)

Sign-SGD via Parameter-Free Optimization
by: Medyakov, Daniil, et al.
Published: (2025)

MT-DAO: Multi-Timescale Distributed Adaptive Optimizers with Local Updates
by: Iacob, Alex, et al.
Published: (2025)

Simple Opinion Dynamics for No-Regret Learning
by: Lazarsfeld, John, et al.
Published: (2023)