Saved in:
| Main Authors: | Ashkboos, Saleh, Mohtashami, Amirkeivan, Croci, Maximilian L., Li, Bo, Cameron, Pashmina, Jaggi, Martin, Alistarh, Dan, Hoefler, Torsten, Hensman, James |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.00456 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond Outliers: A Study of Optimizers Under Quantization
by: Vlassis, Georgios, et al.
Published: (2025)
by: Vlassis, Georgios, et al.
Published: (2025)
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
by: Ashkboos, Saleh, et al.
Published: (2024)
by: Ashkboos, Saleh, et al.
Published: (2024)
CoTFormer: A Chain-of-Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference
by: Mohtashami, Amirkeivan, et al.
Published: (2023)
by: Mohtashami, Amirkeivan, et al.
Published: (2023)
HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs
by: Ashkboos, Saleh, et al.
Published: (2025)
by: Ashkboos, Saleh, et al.
Published: (2025)
OptRot: Mitigating Weight Outliers via Data-Free Rotations for Post-Training Quantization
by: Gadhikar, Advait, et al.
Published: (2025)
by: Gadhikar, Advait, et al.
Published: (2025)
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
by: Pagliardini, Matteo, et al.
Published: (2024)
by: Pagliardini, Matteo, et al.
Published: (2024)
EfQAT: An Efficient Framework for Quantization-Aware Training
by: Ashkboos, Saleh, et al.
Published: (2024)
by: Ashkboos, Saleh, et al.
Published: (2024)
Pyramid Vector Quantization for LLMs
by: van der Ouderaa, Tycho F. A., et al.
Published: (2024)
by: van der Ouderaa, Tycho F. A., et al.
Published: (2024)
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models
by: Frantar, Elias, et al.
Published: (2024)
by: Frantar, Elias, et al.
Published: (2024)
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization
by: Egiazarian, Vage, et al.
Published: (2025)
by: Egiazarian, Vage, et al.
Published: (2025)
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm
by: Chen, Jiale, et al.
Published: (2025)
by: Chen, Jiale, et al.
Published: (2025)
WUSH: Near-Optimal Adaptive Transforms for LLM Quantization
by: Chen, Jiale, et al.
Published: (2025)
by: Chen, Jiale, et al.
Published: (2025)
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
by: Castro, Roberto L., et al.
Published: (2025)
by: Castro, Roberto L., et al.
Published: (2025)
Apertus LLM Family Expansion via Distillation and Quantization
by: Panferov, Andrei, et al.
Published: (2026)
by: Panferov, Andrei, et al.
Published: (2026)
Grid Games: The Power of Multiple Grids for Quantizing Large Language Models
by: Egiazarian, Vage, et al.
Published: (2026)
by: Egiazarian, Vage, et al.
Published: (2026)
Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations
by: Li, Shigang, et al.
Published: (2019)
by: Li, Shigang, et al.
Published: (2019)
Low-Rank Correction for Quantized LLMs
by: Scetbon, Meyer, et al.
Published: (2024)
by: Scetbon, Meyer, et al.
Published: (2024)
Social Learning: Towards Collaborative Learning with Large Language Models
by: Mohtashami, Amirkeivan, et al.
Published: (2023)
by: Mohtashami, Amirkeivan, et al.
Published: (2023)
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations
by: Panferov, Andrei, et al.
Published: (2025)
by: Panferov, Andrei, et al.
Published: (2025)
DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation
by: Xiang, Jingyang, et al.
Published: (2024)
by: Xiang, Jingyang, et al.
Published: (2024)
Behemoth: Benchmarking Unlearning in LLMs Using Fully Synthetic Data
by: Iofinova, Eugenia, et al.
Published: (2026)
by: Iofinova, Eugenia, et al.
Published: (2026)
RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations
by: Su, Zunhai, et al.
Published: (2025)
by: Su, Zunhai, et al.
Published: (2025)
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging
by: Li, Shigang, et al.
Published: (2020)
by: Li, Shigang, et al.
Published: (2020)
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference
by: Chen, Sihan, et al.
Published: (2025)
by: Chen, Sihan, et al.
Published: (2025)
Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs
by: Xu, Binxing, et al.
Published: (2026)
by: Xu, Binxing, et al.
Published: (2026)
MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization
by: Kleinegger, Maximilian, et al.
Published: (2026)
by: Kleinegger, Maximilian, et al.
Published: (2026)
Getting Free Bits Back from Rotational Symmetries in LLMs
by: He, Jiajun, et al.
Published: (2024)
by: He, Jiajun, et al.
Published: (2024)
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
by: Nrusimha, Aniruddha, et al.
Published: (2024)
by: Nrusimha, Aniruddha, et al.
Published: (2024)
Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication
by: Gianinazzi, Lukas, et al.
Published: (2024)
by: Gianinazzi, Lukas, et al.
Published: (2024)
Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs
by: Chrapek, Marcin, et al.
Published: (2025)
by: Chrapek, Marcin, et al.
Published: (2025)
SmoothRot: Combining Channel-Wise Scaling and Rotation for Quantization-Friendly LLMs
by: Czakó, Patrik, et al.
Published: (2025)
by: Czakó, Patrik, et al.
Published: (2025)
SpComm3D: A Framework for Enabling Sparse Communication in 3D Sparse Kernels
by: Abubaker, Nabil, et al.
Published: (2024)
by: Abubaker, Nabil, et al.
Published: (2024)
Near-Optimal Sparse Allreduce for Distributed Deep Learning
by: Li, Shigang, et al.
Published: (2022)
by: Li, Shigang, et al.
Published: (2022)
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
by: Li, Shigang, et al.
Published: (2021)
by: Li, Shigang, et al.
Published: (2021)
Speculative Decoding Speed-of-Light: Optimal Lower Bounds via Branching Random Walks
by: Pankratov, Sergey, et al.
Published: (2025)
by: Pankratov, Sergey, et al.
Published: (2025)
Simple Opinion Dynamics for No-Regret Learning
by: Lazarsfeld, John, et al.
Published: (2023)
by: Lazarsfeld, John, et al.
Published: (2023)
LLMQ: Efficient Lower-Precision Pretraining for Consumer GPUs
by: Schultheis, Erik, et al.
Published: (2025)
by: Schultheis, Erik, et al.
Published: (2025)
Model Compression with Exact Budget Constraints via Riemannian Manifolds
by: Helcig, Michael, et al.
Published: (2026)
by: Helcig, Michael, et al.
Published: (2026)
RotRNN: Modelling Long Sequences with Rotations
by: Biegun, Kai, et al.
Published: (2024)
by: Biegun, Kai, et al.
Published: (2024)
LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems through Polling-Free and Retry-Free Operation
by: Riedel, Samuel, et al.
Published: (2024)
by: Riedel, Samuel, et al.
Published: (2024)
Similar Items
-
Beyond Outliers: A Study of Optimizers Under Quantization
by: Vlassis, Georgios, et al.
Published: (2025) -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
by: Ashkboos, Saleh, et al.
Published: (2024) -
CoTFormer: A Chain-of-Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference
by: Mohtashami, Amirkeivan, et al.
Published: (2023) -
HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs
by: Ashkboos, Saleh, et al.
Published: (2025) -
OptRot: Mitigating Weight Outliers via Data-Free Rotations for Post-Training Quantization
by: Gadhikar, Advait, et al.
Published: (2025)