Saved in:
| Main Authors: | Xi, Haocheng, Cai, Han, Zhu, Ligeng, Lu, Yao, Keutzer, Kurt, Chen, Jianfei, Han, Song |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.19313 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow
by: Xi, Haocheng, et al.
Published: (2026)
by: Xi, Haocheng, et al.
Published: (2026)
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
by: He, Wenkun, et al.
Published: (2025)
by: He, Wenkun, et al.
Published: (2025)
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder
by: Chen, Junyu, et al.
Published: (2025)
by: Chen, Junyu, et al.
Published: (2025)
Oscillation-Reduced MXFP4 Training for Vision Transformers
by: Chen, Yuxiang, et al.
Published: (2025)
by: Chen, Yuxiang, et al.
Published: (2025)
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
by: Gu, Yuxian, et al.
Published: (2025)
by: Gu, Yuxian, et al.
Published: (2025)
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
by: Chen, Junyu, et al.
Published: (2024)
by: Chen, Junyu, et al.
Published: (2024)
OckBench: Measuring the Efficiency of LLM Reasoning
by: Du, Zheng, et al.
Published: (2025)
by: Du, Zheng, et al.
Published: (2025)
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training
by: Zhang, Jintao, et al.
Published: (2025)
by: Zhang, Jintao, et al.
Published: (2025)
SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference
by: Zhang, Jintao, et al.
Published: (2025)
by: Zhang, Jintao, et al.
Published: (2025)
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer
by: Wu, Yecheng, et al.
Published: (2025)
by: Wu, Yecheng, et al.
Published: (2025)
MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling
by: Zhang, Yu, et al.
Published: (2025)
by: Zhang, Yu, et al.
Published: (2025)
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation
by: Maheswaran, Monishwaran, et al.
Published: (2025)
by: Maheswaran, Monishwaran, et al.
Published: (2025)
Tiny Machine Learning: Progress and Futures
by: Lin, Ji, et al.
Published: (2024)
by: Lin, Ji, et al.
Published: (2024)
Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation
by: Li, Xingyang, et al.
Published: (2025)
by: Li, Xingyang, et al.
Published: (2025)
Angles Don't Lie: Unlocking Training-Efficient RL Through the Model's Own Signals
by: Wang, Qinsi, et al.
Published: (2025)
by: Wang, Qinsi, et al.
Published: (2025)
FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling
by: Li, Yitong, et al.
Published: (2026)
by: Li, Yitong, et al.
Published: (2026)
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation
by: Wu, Yecheng, et al.
Published: (2026)
by: Wu, Yecheng, et al.
Published: (2026)
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
by: Tiwari, Rishabh, et al.
Published: (2025)
by: Tiwari, Rishabh, et al.
Published: (2025)
Memory-Efficient Fine-Tuning via Low-Rank Activation Compression
by: Shi, Jiang-Xin, et al.
Published: (2025)
by: Shi, Jiang-Xin, et al.
Published: (2025)
JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search
by: Zou, Dongyun, et al.
Published: (2026)
by: Zou, Dongyun, et al.
Published: (2026)
TinySAM 2: Extreme Memory Compression for Efficient Track Anything Model
by: Ding, Zhaoyuan, et al.
Published: (2026)
by: Ding, Zhaoyuan, et al.
Published: (2026)
Efficient Post-training Quantization with FP8 Formats
by: Shen, Haihao, et al.
Published: (2023)
by: Shen, Haihao, et al.
Published: (2023)
InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models
by: Wang, Wenjun, et al.
Published: (2025)
by: Wang, Wenjun, et al.
Published: (2025)
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
by: Xia, Haojun, et al.
Published: (2024)
by: Xia, Haojun, et al.
Published: (2024)
To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability
by: Lee, Joonhyung, et al.
Published: (2024)
by: Lee, Joonhyung, et al.
Published: (2024)
Training-Free Activation Sparsity in Large Language Models
by: Liu, James, et al.
Published: (2024)
by: Liu, James, et al.
Published: (2024)
The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training
by: Cao, Hengjie, et al.
Published: (2026)
by: Cao, Hengjie, et al.
Published: (2026)
FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error
by: Wang, Fengjuan, et al.
Published: (2025)
by: Wang, Fengjuan, et al.
Published: (2025)
FlashOptim: Optimizers for Memory-Efficient Training
by: Ortiz, Jose Javier Gonzalez, et al.
Published: (2026)
by: Ortiz, Jose Javier Gonzalez, et al.
Published: (2026)
EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training
by: Wang, Yulin, et al.
Published: (2024)
by: Wang, Yulin, et al.
Published: (2024)
How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment
by: Zhu, Rui, et al.
Published: (2026)
by: Zhu, Rui, et al.
Published: (2026)
E2Former-V2: On-the-Fly Equivariant Attention with Linear Activation Memory
by: Huang, Lin, et al.
Published: (2026)
by: Huang, Lin, et al.
Published: (2026)
R-Capsule: Compressing High-Level Plans for Efficient Large Language Model Reasoning
by: Shan, Hongyu, et al.
Published: (2025)
by: Shan, Hongyu, et al.
Published: (2025)
Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
by: Hu, Qinghao, et al.
Published: (2025)
by: Hu, Qinghao, et al.
Published: (2025)
Flash-KMeans: Fast and Memory-Efficient Exact K-Means
by: Yang, Shuo, et al.
Published: (2026)
by: Yang, Shuo, et al.
Published: (2026)
Residual Context Diffusion Language Models
by: Hu, Yuezhou, et al.
Published: (2026)
by: Hu, Yuezhou, et al.
Published: (2026)
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
by: Hao, Yongchang, et al.
Published: (2024)
by: Hao, Yongchang, et al.
Published: (2024)
ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models
by: Liu, Jing, et al.
Published: (2024)
by: Liu, Jing, et al.
Published: (2024)
FlashBlock: Attention Caching for Efficient Long-Context Block Diffusion
by: Chen, Zhuokun, et al.
Published: (2026)
by: Chen, Zhuokun, et al.
Published: (2026)
EfficientViT-SAM: Accelerated Segment Anything Model Without Accuracy Loss
by: Zhang, Zhuoyang, et al.
Published: (2024)
by: Zhang, Zhuoyang, et al.
Published: (2024)
Similar Items
-
Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow
by: Xi, Haocheng, et al.
Published: (2026) -
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
by: He, Wenkun, et al.
Published: (2025) -
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder
by: Chen, Junyu, et al.
Published: (2025) -
Oscillation-Reduced MXFP4 Training for Vision Transformers
by: Chen, Yuxiang, et al.
Published: (2025) -
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
by: Gu, Yuxian, et al.
Published: (2025)