Saved in:
| Main Authors: | Narayan, Saaketh, Gupta, Abhay, Paul, Mansheej, Blalock, Davis |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.05967 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FlashOptim: Optimizers for Memory-Efficient Training
by: Ortiz, Jose Javier Gonzalez, et al.
Published: (2026)
by: Ortiz, Jose Javier Gonzalez, et al.
Published: (2026)
Towards Fully FP8 GEMM LLM Training at Scale
by: Hernández-Cano, Alejandro, et al.
Published: (2025)
by: Hernández-Cano, Alejandro, et al.
Published: (2025)
MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling
by: Zhang, Yu, et al.
Published: (2025)
by: Zhang, Yu, et al.
Published: (2025)
To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability
by: Lee, Joonhyung, et al.
Published: (2024)
by: Lee, Joonhyung, et al.
Published: (2024)
An Inquiry into Datacenter TCO for LLM Inference with FP8
by: Kim, Jiwoo, et al.
Published: (2025)
by: Kim, Jiwoo, et al.
Published: (2025)
Scaling FP8 training to trillion-token LLMs
by: Fishman, Maxim, et al.
Published: (2024)
by: Fishman, Maxim, et al.
Published: (2024)
Critique-out-Loud Reward Models
by: Ankner, Zachary, et al.
Published: (2024)
by: Ankner, Zachary, et al.
Published: (2024)
The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training
by: Cao, Hengjie, et al.
Published: (2026)
by: Cao, Hengjie, et al.
Published: (2026)
FP8 Quantization: The Power of the Exponent
by: Kuzmin, Andrey, et al.
Published: (2022)
by: Kuzmin, Andrey, et al.
Published: (2022)
Scaling Laws for Precision
by: Kumar, Tanishq, et al.
Published: (2024)
by: Kumar, Tanishq, et al.
Published: (2024)
Boltzmann Reinforcement Learning for Noise resilience in Analog Ising Machines
by: Choudhary, Aditya, et al.
Published: (2026)
by: Choudhary, Aditya, et al.
Published: (2026)
Soup to go: mitigating forgetting during continual learning with model averaging
by: Kleiman, Anat, et al.
Published: (2025)
by: Kleiman, Anat, et al.
Published: (2025)
Does your data spark joy? Performance gains from domain upsampling at the end of training
by: Blakeney, Cody, et al.
Published: (2024)
by: Blakeney, Cody, et al.
Published: (2024)
Metis: Training LLMs with FP4 Quantization
by: Cao, Hengjie, et al.
Published: (2025)
by: Cao, Hengjie, et al.
Published: (2025)
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
by: Xi, Haocheng, et al.
Published: (2024)
by: Xi, Haocheng, et al.
Published: (2024)
LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs
by: Kabra, Sanchit, et al.
Published: (2026)
by: Kabra, Sanchit, et al.
Published: (2026)
Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs
by: Fujii, Kazuki, et al.
Published: (2024)
by: Fujii, Kazuki, et al.
Published: (2024)
Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs
by: Zhang, Wuyue, et al.
Published: (2026)
by: Zhang, Wuyue, et al.
Published: (2026)
FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error
by: Wang, Fengjuan, et al.
Published: (2025)
by: Wang, Fengjuan, et al.
Published: (2025)
ZClip: Adaptive Spike Mitigation for LLM Pre-Training
by: Kumar, Abhay, et al.
Published: (2025)
by: Kumar, Abhay, et al.
Published: (2025)
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
by: Ankner, Zachary, et al.
Published: (2024)
by: Ankner, Zachary, et al.
Published: (2024)
Elucidating the Design Space of FP4 training
by: Hu, Robert, et al.
Published: (2025)
by: Hu, Robert, et al.
Published: (2025)
FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning
by: Qiu, Zhaopeng, et al.
Published: (2026)
by: Qiu, Zhaopeng, et al.
Published: (2026)
Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow
by: Xi, Haocheng, et al.
Published: (2026)
by: Xi, Haocheng, et al.
Published: (2026)
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
by: Thangarasa, Vithursan, et al.
Published: (2023)
by: Thangarasa, Vithursan, et al.
Published: (2023)
FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration
by: Baek, Daehyeon, et al.
Published: (2025)
by: Baek, Daehyeon, et al.
Published: (2025)
TWEO: Transformers Without Extreme Outliers Enables FP8 Training And Quantization For Dummies
by: Liang, Guang, et al.
Published: (2025)
by: Liang, Guang, et al.
Published: (2025)
FP4 All the Way: Fully Quantized Training of LLMs
by: Chmiel, Brian, et al.
Published: (2025)
by: Chmiel, Brian, et al.
Published: (2025)
Defeating the Training-Inference Mismatch via FP16
by: Qi, Penghui, et al.
Published: (2025)
by: Qi, Penghui, et al.
Published: (2025)
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
by: Castro, Roberto L., et al.
Published: (2025)
by: Castro, Roberto L., et al.
Published: (2025)
Muon is Scalable for LLM Training
by: Liu, Jingyuan, et al.
Published: (2025)
by: Liu, Jingyuan, et al.
Published: (2025)
Efficient Post-training Quantization with FP8 Formats
by: Shen, Haihao, et al.
Published: (2023)
by: Shen, Haihao, et al.
Published: (2023)
A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task
by: Brinkmann, Jannik, et al.
Published: (2024)
by: Brinkmann, Jannik, et al.
Published: (2024)
Optimizing Large Language Model Training Using FP4 Quantization
by: Wang, Ruizhe, et al.
Published: (2025)
by: Wang, Ruizhe, et al.
Published: (2025)
FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling
by: Li, Yitong, et al.
Published: (2026)
by: Li, Yitong, et al.
Published: (2026)
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
by: Xia, Haojun, et al.
Published: (2024)
by: Xia, Haojun, et al.
Published: (2024)
Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training
by: Nikolić, Miloš, et al.
Published: (2022)
by: Nikolić, Miloš, et al.
Published: (2022)
SubTrack++ : Gradient Subspace Tracking for Scalable LLM Training
by: Rajabi, Sahar, et al.
Published: (2025)
by: Rajabi, Sahar, et al.
Published: (2025)
WeChat-YATT: A Scalable, Simple, Efficient, and Production Ready Training Library
by: Wu, Junyu, et al.
Published: (2025)
by: Wu, Junyu, et al.
Published: (2025)
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training
by: Zhang, Jintao, et al.
Published: (2025)
by: Zhang, Jintao, et al.
Published: (2025)
Similar Items
-
FlashOptim: Optimizers for Memory-Efficient Training
by: Ortiz, Jose Javier Gonzalez, et al.
Published: (2026) -
Towards Fully FP8 GEMM LLM Training at Scale
by: Hernández-Cano, Alejandro, et al.
Published: (2025) -
MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling
by: Zhang, Yu, et al.
Published: (2025) -
To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability
by: Lee, Joonhyung, et al.
Published: (2024) -
An Inquiry into Datacenter TCO for LLM Inference with FP8
by: Kim, Jiwoo, et al.
Published: (2025)