Saved in:
| Main Authors: | Hu, Yuezhou, Zhao, Kang, Huang, Weiyu, Chen, Jianfei, Zhu, Jun |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.01847 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models
by: Huang, Weiyu, et al.
Published: (2025)
by: Huang, Weiyu, et al.
Published: (2025)
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
by: Hu, Yuezhou, et al.
Published: (2024)
by: Hu, Yuezhou, et al.
Published: (2024)
Identifying Sensitive Weights via Post-quantization Integral
by: Hu, Yuezhou, et al.
Published: (2025)
by: Hu, Yuezhou, et al.
Published: (2025)
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
by: Huang, Weiyu, et al.
Published: (2024)
by: Huang, Weiyu, et al.
Published: (2024)
Deterministic Differentiable Structured Pruning for Large Language Models
by: Huang, Weiyu, et al.
Published: (2026)
by: Huang, Weiyu, et al.
Published: (2026)
SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs
by: Li, Kaiwei, et al.
Published: (2016)
by: Li, Kaiwei, et al.
Published: (2016)
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization
by: Xi, Haocheng, et al.
Published: (2024)
by: Xi, Haocheng, et al.
Published: (2024)
Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs
by: Gao, Chang, et al.
Published: (2025)
by: Gao, Chang, et al.
Published: (2025)
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning
by: Zhang, Jintao, et al.
Published: (2026)
by: Zhang, Jintao, et al.
Published: (2026)
Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
by: Xi, Haocheng, et al.
Published: (2025)
by: Xi, Haocheng, et al.
Published: (2025)
To 2:4 Sparsity and Beyond: Neuron-level Activation Function to Accelerate LLM Pre-Training
by: Madhyastha, Meghana, et al.
Published: (2026)
by: Madhyastha, Meghana, et al.
Published: (2026)
Accelerating Transformer Inference and Training with 2:4 Activation Sparsity
by: Haziza, Daniel, et al.
Published: (2025)
by: Haziza, Daniel, et al.
Published: (2025)
Oscillation-Reduced MXFP4 Training for Vision Transformers
by: Chen, Yuxiang, et al.
Published: (2025)
by: Chen, Yuxiang, et al.
Published: (2025)
Mixed Sparsity Training: Achieving 4$\times$ FLOP Reduction for Transformer Pretraining
by: Hu, Pihe, et al.
Published: (2024)
by: Hu, Pihe, et al.
Published: (2024)
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
by: Zhang, Jintao, et al.
Published: (2024)
by: Zhang, Jintao, et al.
Published: (2024)
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention
by: Zhang, Jintao, et al.
Published: (2025)
by: Zhang, Jintao, et al.
Published: (2025)
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
by: Li, Bingrui, et al.
Published: (2024)
by: Li, Bingrui, et al.
Published: (2024)
Graph Generative Pre-trained Transformer
by: Chen, Xiaohui, et al.
Published: (2025)
by: Chen, Xiaohui, et al.
Published: (2025)
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
by: Wang, Ziteng, et al.
Published: (2024)
by: Wang, Ziteng, et al.
Published: (2024)
Efficient Backpropagation with Variance-Controlled Adaptive Sampling
by: Wang, Ziteng, et al.
Published: (2024)
by: Wang, Ziteng, et al.
Published: (2024)
LAPA: Log-Domain Prediction-Driven Dynamic Sparsity Accelerator for Transformer Model
by: Wang, Huizheng, et al.
Published: (2025)
by: Wang, Huizheng, et al.
Published: (2025)
BrainNPT: Pre-training of Transformer networks for brain network classification
by: Hu, Jinlong, et al.
Published: (2023)
by: Hu, Jinlong, et al.
Published: (2023)
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
by: Zhao, Youpeng, et al.
Published: (2024)
by: Zhao, Youpeng, et al.
Published: (2024)
Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers
by: Wang, Lirui, et al.
Published: (2024)
by: Wang, Lirui, et al.
Published: (2024)
Split Adaptation for Pre-trained Vision Transformers
by: Wang, Lixu, et al.
Published: (2025)
by: Wang, Lixu, et al.
Published: (2025)
Generalizing Graph Transformers Across Diverse Graphs and Tasks via Pre-training
by: He, Yufei, et al.
Published: (2024)
by: He, Yufei, et al.
Published: (2024)
Towards Pre-trained Graph Condensation via Optimal Transport
by: Yan, Yeyu, et al.
Published: (2025)
by: Yan, Yeyu, et al.
Published: (2025)
ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer Acceleration
by: Huang, Ning-Chi, et al.
Published: (2024)
by: Huang, Ning-Chi, et al.
Published: (2024)
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders
by: Hu, Yuezhou, et al.
Published: (2025)
by: Hu, Yuezhou, et al.
Published: (2025)
Timer: Generative Pre-trained Transformers Are Large Time Series Models
by: Liu, Yong, et al.
Published: (2024)
by: Liu, Yong, et al.
Published: (2024)
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
by: Pan, Zizheng, et al.
Published: (2024)
by: Pan, Zizheng, et al.
Published: (2024)
Towards a General Framework for Continual Learning with Pre-training
by: Wang, Liyuan, et al.
Published: (2023)
by: Wang, Liyuan, et al.
Published: (2023)
GraphGPT: Generative Pre-trained Graph Eulerian Transformer
by: Zhao, Qifang, et al.
Published: (2023)
by: Zhao, Qifang, et al.
Published: (2023)
Pre-training for Recommendation Unlearning
by: Chen, Guoxuan, et al.
Published: (2025)
by: Chen, Guoxuan, et al.
Published: (2025)
DeepRV: Accelerating Spatiotemporal Inference with Pre-trained Neural Priors
by: Navott, Jhonathan, et al.
Published: (2025)
by: Navott, Jhonathan, et al.
Published: (2025)
Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs
by: Zheng, Kaiwen, et al.
Published: (2023)
by: Zheng, Kaiwen, et al.
Published: (2023)
FedCG: Leverage Conditional GAN for Protecting Privacy and Maintaining Competitive Performance in Federated Learning
by: Wu, Yuezhou, et al.
Published: (2021)
by: Wu, Yuezhou, et al.
Published: (2021)
Homeostasis and Sparsity in Transformer
by: Kotyuzanskiy, Leonid, et al.
Published: (2024)
by: Kotyuzanskiy, Leonid, et al.
Published: (2024)
PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement Learning
by: Ying, Chengyang, et al.
Published: (2024)
by: Ying, Chengyang, et al.
Published: (2024)
SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference
by: Zhang, Jintao, et al.
Published: (2025)
by: Zhang, Jintao, et al.
Published: (2025)
Similar Items
-
CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models
by: Huang, Weiyu, et al.
Published: (2025) -
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
by: Hu, Yuezhou, et al.
Published: (2024) -
Identifying Sensitive Weights via Post-quantization Integral
by: Hu, Yuezhou, et al.
Published: (2025) -
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
by: Huang, Weiyu, et al.
Published: (2024) -
Deterministic Differentiable Structured Pruning for Large Language Models
by: Huang, Weiyu, et al.
Published: (2026)