Saved in:
| Main Authors: | Tseng, Albert, Yu, Tao, Park, Youngsuk |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.20586 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Stochastic Rounding for LLM Training: Theory and Practice
by: Ozkara, Kaan, et al.
Published: (2025)
by: Ozkara, Kaan, et al.
Published: (2025)
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs
by: Bian, Song, et al.
Published: (2025)
by: Bian, Song, et al.
Published: (2025)
Oscillation-Reduced MXFP4 Training for Vision Transformers
by: Chen, Yuxiang, et al.
Published: (2025)
by: Chen, Yuxiang, et al.
Published: (2025)
TritonRL: Training LLMs to Think and Code Triton Without Cheating
by: Woo, Jiin, et al.
Published: (2025)
by: Woo, Jiin, et al.
Published: (2025)
Recipes for Pre-training LLMs with MXFP8
by: Mishra, Asit, et al.
Published: (2025)
by: Mishra, Asit, et al.
Published: (2025)
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
by: Liu, Hongyi, et al.
Published: (2025)
by: Liu, Hongyi, et al.
Published: (2025)
MuonBP: Faster Muon via Block-Periodic Orthogonalization
by: Khaled, Ahmed, et al.
Published: (2025)
by: Khaled, Ahmed, et al.
Published: (2025)
Block Rotation is All You Need for MXFP4 Quantization
by: Shao, Yuantian, et al.
Published: (2025)
by: Shao, Yuantian, et al.
Published: (2025)
TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization
by: Xu, Zukang, et al.
Published: (2026)
by: Xu, Zukang, et al.
Published: (2026)
Pretraining large language models with MXFP4 on Native FP4 Hardware
by: Cim, Musa, et al.
Published: (2026)
by: Cim, Musa, et al.
Published: (2026)
Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction
by: Chhugani, Jatin, et al.
Published: (2026)
by: Chhugani, Jatin, et al.
Published: (2026)
Collage: Light-Weight Low-Precision Strategy for LLM Training
by: Yu, Tao, et al.
Published: (2024)
by: Yu, Tao, et al.
Published: (2024)
ProxSparse: Regularized Learning of Semi-Structured Sparsity Masks for Pretrained LLMs
by: Liu, Hongyi, et al.
Published: (2025)
by: Liu, Hongyi, et al.
Published: (2025)
Online Posterior Sampling with a Diffusion Prior
by: Kveton, Branislav, et al.
Published: (2024)
by: Kveton, Branislav, et al.
Published: (2024)
MXNorm: Reusing MXFP block scales for efficient tensor normalisation
by: McLean, Callum, et al.
Published: (2026)
by: McLean, Callum, et al.
Published: (2026)
Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference
by: Ding, Yifu, et al.
Published: (2026)
by: Ding, Yifu, et al.
Published: (2026)
Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor
by: Li, Xiaocan, et al.
Published: (2026)
by: Li, Xiaocan, et al.
Published: (2026)
Shadow Cones: A Generalized Framework for Partial Order Embeddings
by: Yu, Tao, et al.
Published: (2023)
by: Yu, Tao, et al.
Published: (2023)
Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting
by: Hasson, Hilaf, et al.
Published: (2023)
by: Hasson, Hilaf, et al.
Published: (2023)
Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
by: Gautam, Tanmay, et al.
Published: (2024)
by: Gautam, Tanmay, et al.
Published: (2024)
L$^3$: Large Lookup Layers
by: Tseng, Albert, et al.
Published: (2026)
by: Tseng, Albert, et al.
Published: (2026)
Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models
by: Deng, Wenlong, et al.
Published: (2026)
by: Deng, Wenlong, et al.
Published: (2026)
RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models
by: Wei, Quan, et al.
Published: (2025)
by: Wei, Quan, et al.
Published: (2025)
Verifier-free Test-Time Sampling for Vision Language Action Models
by: Jang, Suhyeok, et al.
Published: (2025)
by: Jang, Suhyeok, et al.
Published: (2025)
Model-Preserving Adaptive Rounding
by: Tseng, Albert, et al.
Published: (2025)
by: Tseng, Albert, et al.
Published: (2025)
Metis: Training LLMs with FP4 Quantization
by: Cao, Hengjie, et al.
Published: (2025)
by: Cao, Hengjie, et al.
Published: (2025)
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
by: Ouyang, Xu, et al.
Published: (2024)
by: Ouyang, Xu, et al.
Published: (2024)
QTIP: Quantization with Trellises and Incoherence Processing
by: Tseng, Albert, et al.
Published: (2024)
by: Tseng, Albert, et al.
Published: (2024)
MuCon: Clipped Muon Updates for LLM Training
by: Yi, Albert
Published: (2026)
by: Yi, Albert
Published: (2026)
Training Dynamics Impact Post-Training Quantization Robustness
by: Catalan-Tatjer, Albert, et al.
Published: (2025)
by: Catalan-Tatjer, Albert, et al.
Published: (2025)
Inference Optimization of Foundation Models on AI Accelerators
by: Park, Youngsuk, et al.
Published: (2024)
by: Park, Youngsuk, et al.
Published: (2024)
MedM2T: A MultiModal Framework for Time-Aware Modeling with Electronic Health Record and Electrocardiogram Data
by: Kuo, Yu-Chen, et al.
Published: (2025)
by: Kuo, Yu-Chen, et al.
Published: (2025)
Learning-Based WiFi Fingerprint Inpainting via Generative Adversarial Networks
by: Chan, Yu, et al.
Published: (2024)
by: Chan, Yu, et al.
Published: (2024)
StreetMath: Study of LLMs' Approximation Behaviors
by: Tseng, Chiung-Yi, et al.
Published: (2025)
by: Tseng, Chiung-Yi, et al.
Published: (2025)
Laplace Approximation For Tensor Train Kernel Machines In System Identification
by: Saiapin, Albert, et al.
Published: (2025)
by: Saiapin, Albert, et al.
Published: (2025)
Physics-Informed Neural Network for Predicting Out-of-Training-Range TCAD Solution with Minimized Domain Expertise
by: Lu, Albert, et al.
Published: (2024)
by: Lu, Albert, et al.
Published: (2024)
FP4 All the Way: Fully Quantized Training of LLMs
by: Chmiel, Brian, et al.
Published: (2025)
by: Chmiel, Brian, et al.
Published: (2025)
CAAP: Class-Dependent Automatic Data Augmentation Based On Adaptive Policies For Time Series
by: Chang, Tien-Yu, et al.
Published: (2024)
by: Chang, Tien-Yu, et al.
Published: (2024)
Test-Time Training on Graphs with Large Language Models (LLMs)
by: Zhang, Jiaxin, et al.
Published: (2024)
by: Zhang, Jiaxin, et al.
Published: (2024)
LLM4TS: Aligning Pre-Trained LLMs as Data-Efficient Time-Series Forecasters
by: Chang, Ching, et al.
Published: (2023)
by: Chang, Ching, et al.
Published: (2023)
Similar Items
-
Stochastic Rounding for LLM Training: Theory and Practice
by: Ozkara, Kaan, et al.
Published: (2025) -
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs
by: Bian, Song, et al.
Published: (2025) -
Oscillation-Reduced MXFP4 Training for Vision Transformers
by: Chen, Yuxiang, et al.
Published: (2025) -
TritonRL: Training LLMs to Think and Code Triton Without Cheating
by: Woo, Jiin, et al.
Published: (2025) -
Recipes for Pre-training LLMs with MXFP8
by: Mishra, Asit, et al.
Published: (2025)