:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Narayan, Saaketh, Gupta, Abhay, Paul, Mansheej, Blalock, Davis
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2502.05967
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

FlashOptim: Optimizers for Memory-Efficient Training
by: Ortiz, Jose Javier Gonzalez, et al.
Published: (2026)

Towards Fully FP8 GEMM LLM Training at Scale
by: Hernández-Cano, Alejandro, et al.
Published: (2025)

MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling
by: Zhang, Yu, et al.
Published: (2025)

To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability
by: Lee, Joonhyung, et al.
Published: (2024)

An Inquiry into Datacenter TCO for LLM Inference with FP8
by: Kim, Jiwoo, et al.
Published: (2025)

Scaling FP8 training to trillion-token LLMs
by: Fishman, Maxim, et al.
Published: (2024)

Critique-out-Loud Reward Models
by: Ankner, Zachary, et al.
Published: (2024)

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training
by: Cao, Hengjie, et al.
Published: (2026)

FP8 Quantization: The Power of the Exponent
by: Kuzmin, Andrey, et al.
Published: (2022)

Scaling Laws for Precision
by: Kumar, Tanishq, et al.
Published: (2024)

Boltzmann Reinforcement Learning for Noise resilience in Analog Ising Machines
by: Choudhary, Aditya, et al.
Published: (2026)

Soup to go: mitigating forgetting during continual learning with model averaging
by: Kleiman, Anat, et al.
Published: (2025)

Does your data spark joy? Performance gains from domain upsampling at the end of training
by: Blakeney, Cody, et al.
Published: (2024)

Metis: Training LLMs with FP4 Quantization
by: Cao, Hengjie, et al.
Published: (2025)

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
by: Xi, Haocheng, et al.
Published: (2024)

LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs
by: Kabra, Sanchit, et al.
Published: (2026)

Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs
by: Fujii, Kazuki, et al.
Published: (2024)

Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs
by: Zhang, Wuyue, et al.
Published: (2026)

FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error
by: Wang, Fengjuan, et al.
Published: (2025)

ZClip: Adaptive Spike Mitigation for LLM Pre-Training
by: Kumar, Abhay, et al.
Published: (2025)

Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
by: Ankner, Zachary, et al.
Published: (2024)

Elucidating the Design Space of FP4 training
by: Hu, Robert, et al.
Published: (2025)

FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning
by: Qiu, Zhaopeng, et al.
Published: (2026)

Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow
by: Xi, Haocheng, et al.
Published: (2026)

Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
by: Thangarasa, Vithursan, et al.
Published: (2023)

FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration
by: Baek, Daehyeon, et al.
Published: (2025)

TWEO: Transformers Without Extreme Outliers Enables FP8 Training And Quantization For Dummies
by: Liang, Guang, et al.
Published: (2025)

FP4 All the Way: Fully Quantized Training of LLMs
by: Chmiel, Brian, et al.
Published: (2025)

Defeating the Training-Inference Mismatch via FP16
by: Qi, Penghui, et al.
Published: (2025)

Quartet: Native FP4 Training Can Be Optimal for Large Language Models
by: Castro, Roberto L., et al.
Published: (2025)

Muon is Scalable for LLM Training
by: Liu, Jingyuan, et al.
Published: (2025)

Efficient Post-training Quantization with FP8 Formats
by: Shen, Haihao, et al.
Published: (2023)

A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task
by: Brinkmann, Jannik, et al.
Published: (2024)

Optimizing Large Language Model Training Using FP4 Quantization
by: Wang, Ruizhe, et al.
Published: (2025)

FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling
by: Li, Yitong, et al.
Published: (2026)

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
by: Xia, Haojun, et al.
Published: (2024)

Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training
by: Nikolić, Miloš, et al.
Published: (2022)

SubTrack++ : Gradient Subspace Tracking for Scalable LLM Training
by: Rajabi, Sahar, et al.
Published: (2025)

WeChat-YATT: A Scalable, Simple, Efficient, and Production Ready Training Library
by: Wu, Junyu, et al.
Published: (2025)

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training
by: Zhang, Jintao, et al.
Published: (2025)