:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xi, Haocheng, Cai, Han, Zhu, Ligeng, Lu, Yao, Keutzer, Kurt, Chen, Jianfei, Han, Song
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2410.19313
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow
by: Xi, Haocheng, et al.
Published: (2026)

DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
by: He, Wenkun, et al.
Published: (2025)

DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder
by: Chen, Junyu, et al.
Published: (2025)

Oscillation-Reduced MXFP4 Training for Vision Transformers
by: Chen, Yuxiang, et al.
Published: (2025)

Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
by: Gu, Yuxian, et al.
Published: (2025)

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
by: Chen, Junyu, et al.
Published: (2024)

OckBench: Measuring the Efficiency of LLM Reasoning
by: Du, Zheng, et al.
Published: (2025)

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training
by: Zhang, Jintao, et al.
Published: (2025)

SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference
by: Zhang, Jintao, et al.
Published: (2025)

DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer
by: Wu, Yecheng, et al.
Published: (2025)

MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling
by: Zhang, Yu, et al.
Published: (2025)

Arbitrage: Efficient Reasoning via Advantage-Aware Speculation
by: Maheswaran, Monishwaran, et al.
Published: (2025)

Tiny Machine Learning: Progress and Futures
by: Lin, Ji, et al.
Published: (2024)

Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation
by: Li, Xingyang, et al.
Published: (2025)

Angles Don't Lie: Unlocking Training-Efficient RL Through the Model's Own Signals
by: Wang, Qinsi, et al.
Published: (2025)

FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling
by: Li, Yitong, et al.
Published: (2026)

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation
by: Wu, Yecheng, et al.
Published: (2026)

QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
by: Tiwari, Rishabh, et al.
Published: (2025)

Memory-Efficient Fine-Tuning via Low-Rank Activation Compression
by: Shi, Jiang-Xin, et al.
Published: (2025)

JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search
by: Zou, Dongyun, et al.
Published: (2026)

TinySAM 2: Extreme Memory Compression for Efficient Track Anything Model
by: Ding, Zhaoyuan, et al.
Published: (2026)

Efficient Post-training Quantization with FP8 Formats
by: Shen, Haihao, et al.
Published: (2023)

InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models
by: Wang, Wenjun, et al.
Published: (2025)

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
by: Xia, Haojun, et al.
Published: (2024)

To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability
by: Lee, Joonhyung, et al.
Published: (2024)

Training-Free Activation Sparsity in Large Language Models
by: Liu, James, et al.
Published: (2024)

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training
by: Cao, Hengjie, et al.
Published: (2026)

FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error
by: Wang, Fengjuan, et al.
Published: (2025)

FlashOptim: Optimizers for Memory-Efficient Training
by: Ortiz, Jose Javier Gonzalez, et al.
Published: (2026)

EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training
by: Wang, Yulin, et al.
Published: (2024)

How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment
by: Zhu, Rui, et al.
Published: (2026)

E2Former-V2: On-the-Fly Equivariant Attention with Linear Activation Memory
by: Huang, Lin, et al.
Published: (2026)

R-Capsule: Compressing High-Level Plans for Efficient Large Language Model Reasoning
by: Shan, Hongyu, et al.
Published: (2025)

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
by: Hu, Qinghao, et al.
Published: (2025)

Flash-KMeans: Fast and Memory-Efficient Exact K-Means
by: Yang, Shuo, et al.
Published: (2026)

Residual Context Diffusion Language Models
by: Hu, Yuezhou, et al.
Published: (2026)

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
by: Hao, Yongchang, et al.
Published: (2024)

ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models
by: Liu, Jing, et al.
Published: (2024)

FlashBlock: Attention Caching for Efficient Long-Context Block Diffusion
by: Chen, Zhuokun, et al.
Published: (2026)

EfficientViT-SAM: Accelerated Segment Anything Model Without Accuracy Loss
by: Zhang, Zhuoyang, et al.
Published: (2024)