Saved in:
| Main Authors: | Cook, Jack, Guo, Junxian, Xiao, Guangxuan, Lin, Yujun, Wyss, Keith, Nazemi, Mahdi, Mishra, Asit, del Mundo, Carlo, Blankevoort, Tijmen, Han, Song |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.02010 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SOAR: Scale Optimization for Accurate Reconstruction in NVFP4 Quantization
by: Bao, Chengzhu, et al.
Published: (2026)
by: Bao, Chengzhu, et al.
Published: (2026)
Adaptive Block-Scaled Data Types
by: Cook, Jack, et al.
Published: (2026)
by: Cook, Jack, et al.
Published: (2026)
Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery
by: Xin, Meng, et al.
Published: (2026)
by: Xin, Meng, et al.
Published: (2026)
Pruning vs Quantization: Which is Better?
by: Kuzmin, Andrey, et al.
Published: (2023)
by: Kuzmin, Andrey, et al.
Published: (2023)
Optimizing Mixture of Block Attention
by: Xiao, Guangxuan, et al.
Published: (2025)
by: Xiao, Guangxuan, et al.
Published: (2025)
FP8 Quantization: The Power of the Exponent
by: Kuzmin, Andrey, et al.
Published: (2022)
by: Kuzmin, Andrey, et al.
Published: (2022)
Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning
by: Kopiczko, Dawid J., et al.
Published: (2026)
by: Kopiczko, Dawid J., et al.
Published: (2026)
Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs
by: Kopiczko, Dawid J., et al.
Published: (2024)
by: Kopiczko, Dawid J., et al.
Published: (2024)
VeRA: Vector-based Random Matrix Adaptation
by: Kopiczko, Dawid J., et al.
Published: (2023)
by: Kopiczko, Dawid J., et al.
Published: (2023)
XAttention: Block Sparse Attention with Antidiagonal Scoring
by: Xu, Ruyi, et al.
Published: (2025)
by: Xu, Ruyi, et al.
Published: (2025)
Pretraining Large Language Models with NVFP4
by: NVIDIA, et al.
Published: (2025)
by: NVIDIA, et al.
Published: (2025)
GPTVQ: The Blessing of Dimensionality for LLM Quantization
by: van Baalen, Mart, et al.
Published: (2024)
by: van Baalen, Mart, et al.
Published: (2024)
NeuroBlend: Towards Low-Power yet Accurate Neural Network-Based Inference Engine Blending Binary and Fixed-Point Convolutions
by: Fayyazi, Arash, et al.
Published: (2023)
by: Fayyazi, Arash, et al.
Published: (2023)
Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation
by: Azizi, Seyedarmin, et al.
Published: (2023)
by: Azizi, Seyedarmin, et al.
Published: (2023)
MixFP4: Enhancing NVFP4 with Adaptive FP4/INT4 Block Representations
by: Zou, Jiaxiang, et al.
Published: (2026)
by: Zou, Jiaxiang, et al.
Published: (2026)
ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs
by: Meng, Haoqian, et al.
Published: (2026)
by: Meng, Haoqian, et al.
Published: (2026)
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
by: Xiao, Guangxuan, et al.
Published: (2022)
by: Xiao, Guangxuan, et al.
Published: (2022)
FAAR: Format-Aware Adaptive Rounding for NVFP4
by: Li, Hanglin, et al.
Published: (2026)
by: Li, Hanglin, et al.
Published: (2026)
Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding
by: Bergner, Benjamin, et al.
Published: (2024)
by: Bergner, Benjamin, et al.
Published: (2024)
Elastic ViTs from Pretrained Models without Retraining
by: Simoncini, Walter, et al.
Published: (2025)
by: Simoncini, Walter, et al.
Published: (2025)
InterroGate: Learning to Share, Specialize, and Prune Representations for Multi-task Learning
by: Bejnordi, Babak Ehteshami, et al.
Published: (2024)
by: Bejnordi, Babak Ehteshami, et al.
Published: (2024)
ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
by: Liu, Zechun, et al.
Published: (2025)
by: Liu, Zechun, et al.
Published: (2025)
Accurate Block Quantization in LLMs with Outliers
by: Trukhanov, Nikita, et al.
Published: (2024)
by: Trukhanov, Nikita, et al.
Published: (2024)
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
by: Lin, Yujun, et al.
Published: (2024)
by: Lin, Yujun, et al.
Published: (2024)
RaZeR: Pushing the Limits of NVFP4 Quantization with Redundant Zero Remapping
by: Chen, Yuzong, et al.
Published: (2025)
by: Chen, Yuzong, et al.
Published: (2025)
Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy
by: Azizi, Seyedarmin, et al.
Published: (2024)
by: Azizi, Seyedarmin, et al.
Published: (2024)
Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation
by: Panferov, Andrei, et al.
Published: (2026)
by: Panferov, Andrei, et al.
Published: (2026)
The LLM Surgeon
by: van der Ouderaa, Tycho F. A., et al.
Published: (2023)
by: van der Ouderaa, Tycho F. A., et al.
Published: (2023)
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
by: Yang, Shang, et al.
Published: (2025)
by: Yang, Shang, et al.
Published: (2025)
Bifurcation and Quasiperiodic Behaviors of Ion Acoustic Waves in Magnetoplasmas with Nonthermal Electrons Featuring Tsallis Distribution
by: Asit Saha
Published: (2015)
by: Asit Saha
Published: (2015)
Solitonic, Periodic and Quasiperiodic Behaviors of Dust Ion Acoustic Waves in Superthermal Plasmas
by: Asit Saha
Published: (2015)
by: Asit Saha
Published: (2015)
Dynamic Motions of Ion Acoustic Waves in Plasmas with Superthermal Electrons
by: Asit Saha
Published: (2015)
by: Asit Saha
Published: (2015)
Recipes for Pre-training LLMs with MXFP8
by: Mishra, Asit, et al.
Published: (2025)
by: Mishra, Asit, et al.
Published: (2025)
SpinQuant: LLM quantization with learned rotations
by: Liu, Zechun, et al.
Published: (2024)
by: Liu, Zechun, et al.
Published: (2024)
LAUREANO CASTRO NOGUEIRA, LUIS CASTRO NOGUEIRA Y MIGUEL ÁNGEL CASTRO NOGUEIRA, ¿Quién teme a la naturaleza humana? Madrid: Tecnos, 2008
by: Jordi Mundó
Published: (2010)
by: Jordi Mundó
Published: (2010)
Análisis crítico de la evolución de la anemia y la deficiencia de micronutrimientos en la población
by: Verónica Mundo
Published: (2007)
by: Verónica Mundo
Published: (2007)
Simposio: Educación, convivencia e instituciones. Pilares de una visión compartida
by: Mabel Mundó
Published: (2012)
by: Mabel Mundó
Published: (2012)
TetraJet-v2: Accurate NVFP4 Training for Large Language Models with Oscillation Suppression and Outlier Control
by: Chen, Yuxiang, et al.
Published: (2025)
by: Chen, Yuxiang, et al.
Published: (2025)
Dissecting Outlier Dynamics in LLM NVFP4 Pretraining
by: Dong, Peijie, et al.
Published: (2026)
by: Dong, Peijie, et al.
Published: (2026)
Biological activities (antibacterial, antifungal and cytotoxic) of secondary metabolites of Ircinia spp.
by: Nazemi, Melika
Published: (2013)
by: Nazemi, Melika
Published: (2013)
Similar Items
-
SOAR: Scale Optimization for Accurate Reconstruction in NVFP4 Quantization
by: Bao, Chengzhu, et al.
Published: (2026) -
Adaptive Block-Scaled Data Types
by: Cook, Jack, et al.
Published: (2026) -
Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery
by: Xin, Meng, et al.
Published: (2026) -
Pruning vs Quantization: Which is Better?
by: Kuzmin, Andrey, et al.
Published: (2023) -
Optimizing Mixture of Block Attention
by: Xiao, Guangxuan, et al.
Published: (2025)