:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hu, Yuezhou, Zhao, Kang, Huang, Weiyu, Chen, Jianfei, Zhu, Jun
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2404.01847
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models
by: Huang, Weiyu, et al.
Published: (2025)

S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
by: Hu, Yuezhou, et al.
Published: (2024)

Identifying Sensitive Weights via Post-quantization Integral
by: Hu, Yuezhou, et al.
Published: (2025)

Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
by: Huang, Weiyu, et al.
Published: (2024)

Deterministic Differentiable Structured Pruning for Large Language Models
by: Huang, Weiyu, et al.
Published: (2026)

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs
by: Li, Kaiwei, et al.
Published: (2016)

Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization
by: Xi, Haocheng, et al.
Published: (2024)

Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs
by: Gao, Chang, et al.
Published: (2025)

SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning
by: Zhang, Jintao, et al.
Published: (2026)

Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
by: Xi, Haocheng, et al.
Published: (2025)

To 2:4 Sparsity and Beyond: Neuron-level Activation Function to Accelerate LLM Pre-Training
by: Madhyastha, Meghana, et al.
Published: (2026)

Accelerating Transformer Inference and Training with 2:4 Activation Sparsity
by: Haziza, Daniel, et al.
Published: (2025)

Oscillation-Reduced MXFP4 Training for Vision Transformers
by: Chen, Yuxiang, et al.
Published: (2025)

Mixed Sparsity Training: Achieving 4$\times$ FLOP Reduction for Transformer Pretraining
by: Hu, Pihe, et al.
Published: (2024)

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
by: Zhang, Jintao, et al.
Published: (2024)

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention
by: Zhang, Jintao, et al.
Published: (2025)

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
by: Li, Bingrui, et al.
Published: (2024)

Graph Generative Pre-trained Transformer
by: Chen, Xiaohui, et al.
Published: (2025)

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
by: Wang, Ziteng, et al.
Published: (2024)

Efficient Backpropagation with Variance-Controlled Adaptive Sampling
by: Wang, Ziteng, et al.
Published: (2024)

LAPA: Log-Domain Prediction-Driven Dynamic Sparsity Accelerator for Transformer Model
by: Wang, Huizheng, et al.
Published: (2025)

BrainNPT: Pre-training of Transformer networks for brain network classification
by: Hu, Jinlong, et al.
Published: (2023)

ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
by: Zhao, Youpeng, et al.
Published: (2024)

Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers
by: Wang, Lirui, et al.
Published: (2024)

Split Adaptation for Pre-trained Vision Transformers
by: Wang, Lixu, et al.
Published: (2025)

Generalizing Graph Transformers Across Diverse Graphs and Tasks via Pre-training
by: He, Yufei, et al.
Published: (2024)

Towards Pre-trained Graph Condensation via Optimal Transport
by: Yan, Yeyu, et al.
Published: (2025)

ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer Acceleration
by: Huang, Ning-Chi, et al.
Published: (2024)

AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders
by: Hu, Yuezhou, et al.
Published: (2025)

Timer: Generative Pre-trained Transformers Are Large Time Series Models
by: Liu, Yong, et al.
Published: (2024)

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
by: Pan, Zizheng, et al.
Published: (2024)

Towards a General Framework for Continual Learning with Pre-training
by: Wang, Liyuan, et al.
Published: (2023)

GraphGPT: Generative Pre-trained Graph Eulerian Transformer
by: Zhao, Qifang, et al.
Published: (2023)

Pre-training for Recommendation Unlearning
by: Chen, Guoxuan, et al.
Published: (2025)

DeepRV: Accelerating Spatiotemporal Inference with Pre-trained Neural Priors
by: Navott, Jhonathan, et al.
Published: (2025)

Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs
by: Zheng, Kaiwen, et al.
Published: (2023)

FedCG: Leverage Conditional GAN for Protecting Privacy and Maintaining Competitive Performance in Federated Learning
by: Wu, Yuezhou, et al.
Published: (2021)

Homeostasis and Sparsity in Transformer
by: Kotyuzanskiy, Leonid, et al.
Published: (2024)

PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement Learning
by: Ying, Chengyang, et al.
Published: (2024)

SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference
by: Zhang, Jintao, et al.
Published: (2025)