Saved in:
| Main Authors: | Volkova, Alexandra, Safaryan, Mher, Lampert, Christoph H., Alistarh, Dan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.07712 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Unified Scaling Laws for Compressed Representations
by: Panferov, Andrei, et al.
Published: (2025)
by: Panferov, Andrei, et al.
Published: (2025)
CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training
by: Tabesh, Soroush, et al.
Published: (2025)
by: Tabesh, Soroush, et al.
Published: (2025)
MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning
by: Modoranu, Ionut-Vlad, et al.
Published: (2026)
by: Modoranu, Ionut-Vlad, et al.
Published: (2026)
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
by: Robert, Thomas, et al.
Published: (2024)
by: Robert, Thomas, et al.
Published: (2024)
DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers
by: Modoranu, Ionut-Vlad, et al.
Published: (2026)
by: Modoranu, Ionut-Vlad, et al.
Published: (2026)
The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information
by: Wu, Diyuan, et al.
Published: (2024)
by: Wu, Diyuan, et al.
Published: (2024)
FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models
by: Modoranu, Ionut-Vlad, et al.
Published: (2025)
by: Modoranu, Ionut-Vlad, et al.
Published: (2025)
MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence
by: Modoranu, Ionut-Vlad, et al.
Published: (2024)
by: Modoranu, Ionut-Vlad, et al.
Published: (2024)
Beyond Outliers: A Study of Optimizers Under Quantization
by: Vlassis, Georgios, et al.
Published: (2025)
by: Vlassis, Georgios, et al.
Published: (2025)
GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity
by: Maranjyan, Artavazd, et al.
Published: (2022)
by: Maranjyan, Artavazd, et al.
Published: (2022)
LoRDO: Distributed Low-Rank Optimization with Infrequent Communication
by: Jovanović, Andrej, et al.
Published: (2026)
by: Jovanović, Andrej, et al.
Published: (2026)
On Biased Compression for Distributed Learning
by: Beznosikov, Aleksandr, et al.
Published: (2020)
by: Beznosikov, Aleksandr, et al.
Published: (2020)
Compression Scaling Laws:Unifying Sparsity and Quantization
by: Frantar, Elias, et al.
Published: (2025)
by: Frantar, Elias, et al.
Published: (2025)
MT-DAO: Multi-Timescale Distributed Adaptive Optimizers with Local Updates
by: Iacob, Alex, et al.
Published: (2025)
by: Iacob, Alex, et al.
Published: (2025)
Behemoth: Benchmarking Unlearning in LLMs Using Fully Synthetic Data
by: Iofinova, Eugenia, et al.
Published: (2026)
by: Iofinova, Eugenia, et al.
Published: (2026)
Model Compression with Exact Budget Constraints via Riemannian Manifolds
by: Helcig, Michael, et al.
Published: (2026)
by: Helcig, Michael, et al.
Published: (2026)
Towards Combinatorial Interpretability of Neural Computation
by: Adler, Micah, et al.
Published: (2025)
by: Adler, Micah, et al.
Published: (2025)
DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models
by: Iacob, Alex, et al.
Published: (2025)
by: Iacob, Alex, et al.
Published: (2025)
ASIDE: Architectural Separation of Instructions and Data in Language Models
by: Zverev, Egor, et al.
Published: (2025)
by: Zverev, Egor, et al.
Published: (2025)
LLMQ: Efficient Lower-Precision Pretraining for Consumer GPUs
by: Schultheis, Erik, et al.
Published: (2025)
by: Schultheis, Erik, et al.
Published: (2025)
The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws
by: Jin, Tian, et al.
Published: (2025)
by: Jin, Tian, et al.
Published: (2025)
Efficient Data Selection at Scale via Influence Distillation
by: Nikdan, Mahdi, et al.
Published: (2025)
by: Nikdan, Mahdi, et al.
Published: (2025)
Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence
by: Ansaripour, Matin, et al.
Published: (2022)
by: Ansaripour, Matin, et al.
Published: (2022)
MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization
by: Kleinegger, Maximilian, et al.
Published: (2026)
by: Kleinegger, Maximilian, et al.
Published: (2026)
Statistically-Lossless Quantization of Large Language Models
by: Helcig, Michael, et al.
Published: (2026)
by: Helcig, Michael, et al.
Published: (2026)
Powerset Convolutional Neural Networks
by: Wendler, Chris, et al.
Published: (2019)
by: Wendler, Chris, et al.
Published: (2019)
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation
by: Nikdan, Mahdi, et al.
Published: (2024)
by: Nikdan, Mahdi, et al.
Published: (2024)
Simple Opinion Dynamics for No-Regret Learning
by: Lazarsfeld, John, et al.
Published: (2023)
by: Lazarsfeld, John, et al.
Published: (2023)
HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs
by: Ashkboos, Saleh, et al.
Published: (2025)
by: Ashkboos, Saleh, et al.
Published: (2025)
Apertus LLM Family Expansion via Distillation and Quantization
by: Panferov, Andrei, et al.
Published: (2026)
by: Panferov, Andrei, et al.
Published: (2026)
Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation
by: Panferov, Andrei, et al.
Published: (2026)
by: Panferov, Andrei, et al.
Published: (2026)
EvoPress: Accurate Dynamic Model Compression via Evolutionary Search
by: Sieberling, Oliver, et al.
Published: (2024)
by: Sieberling, Oliver, et al.
Published: (2024)
Communication-Efficient Federated Learning With Data and Client Heterogeneity
by: Zakerinia, Hossein, et al.
Published: (2022)
by: Zakerinia, Hossein, et al.
Published: (2022)
Towards Scaling Laws for Symbolic Regression
by: Otte, David, et al.
Published: (2025)
by: Otte, David, et al.
Published: (2025)
Adaptive Sampling and Clipping for Private Worst-Case Group Optimization
by: Cairney-Leeming, Max, et al.
Published: (2026)
by: Cairney-Leeming, Max, et al.
Published: (2026)
SPADE: Sparsity-Guided Debugging for Deep Neural Networks
by: Moakhar, Arshia Soltani, et al.
Published: (2023)
by: Moakhar, Arshia Soltani, et al.
Published: (2023)
Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models
by: Kurtic, Eldar, et al.
Published: (2024)
by: Kurtic, Eldar, et al.
Published: (2024)
From Privacy to Generalization: Linear Max-Information Bounds for DP-SGD
by: Lampert, Christoph H., et al.
Published: (2026)
by: Lampert, Christoph H., et al.
Published: (2026)
Fast Rate Bounds for Multi-Task and Meta-Learning with Different Sample Sizes
by: Zakerinia, Hossein, et al.
Published: (2025)
by: Zakerinia, Hossein, et al.
Published: (2025)
1-Lipschitz Neural Networks are more expressive with N-Activations
by: Prach, Bernd, et al.
Published: (2023)
by: Prach, Bernd, et al.
Published: (2023)
Similar Items
-
Unified Scaling Laws for Compressed Representations
by: Panferov, Andrei, et al.
Published: (2025) -
CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training
by: Tabesh, Soroush, et al.
Published: (2025) -
MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning
by: Modoranu, Ionut-Vlad, et al.
Published: (2026) -
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
by: Robert, Thomas, et al.
Published: (2024) -
DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers
by: Modoranu, Ionut-Vlad, et al.
Published: (2026)