Saved in:
| Main Authors: | Yun, Longfei, Zhuang, Yonghao, Fu, Yao, Xing, Eric P, Zhang, Hao |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.02852 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mixture of Experts in Large Language Models
by: Zhang, Danyang, et al.
Published: (2025)
by: Zhang, Danyang, et al.
Published: (2025)
Towards Faster Language Model Inference Using Mixture-of-Experts Flow Matching
by: Li, Aihua
Published: (2026)
by: Li, Aihua
Published: (2026)
A Closer Look into Mixture-of-Experts in Large Language Models
by: Lo, Ka Man, et al.
Published: (2024)
by: Lo, Ka Man, et al.
Published: (2024)
Bayesian Mixture of Experts For Large Language Models
by: Dialameh, Maryam, et al.
Published: (2025)
by: Dialameh, Maryam, et al.
Published: (2025)
Towards Principled Design of Mixture-of-Experts Language Models under Memory and Inference Constraints
by: Liew, Seng Pei, et al.
Published: (2026)
by: Liew, Seng Pei, et al.
Published: (2026)
MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts
by: Zhao, Hao, et al.
Published: (2024)
by: Zhao, Hao, et al.
Published: (2024)
Efficient Long-context Language Model Training by Core Attention Disaggregation
by: Zhuang, Yonghao, et al.
Published: (2025)
by: Zhuang, Yonghao, et al.
Published: (2025)
CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing
by: Zheng, Wenhao, et al.
Published: (2025)
by: Zheng, Wenhao, et al.
Published: (2025)
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
by: Liu, Enshu, et al.
Published: (2024)
by: Liu, Enshu, et al.
Published: (2024)
Upcycling Large Language Models into Mixture of Experts
by: He, Ethan, et al.
Published: (2024)
by: He, Ethan, et al.
Published: (2024)
MC#: Mixture Compressor for Mixture-of-Experts Large Models
by: Huang, Wei, et al.
Published: (2025)
by: Huang, Wei, et al.
Published: (2025)
A Survey on Mixture of Experts in Large Language Models
by: Cai, Weilin, et al.
Published: (2024)
by: Cai, Weilin, et al.
Published: (2024)
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference
by: Yao, Jinghan, et al.
Published: (2024)
by: Yao, Jinghan, et al.
Published: (2024)
Speculating Experts Accelerates Inference for Mixture-of-Experts
by: Madan, Vivan, et al.
Published: (2026)
by: Madan, Vivan, et al.
Published: (2026)
MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts
by: Xie, Zhitian, et al.
Published: (2024)
by: Xie, Zhitian, et al.
Published: (2024)
Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference
by: Chu, Kexin, et al.
Published: (2025)
by: Chu, Kexin, et al.
Published: (2025)
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
by: Hu, Xing, et al.
Published: (2025)
by: Hu, Xing, et al.
Published: (2025)
Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models
by: Tang, Yuanbo, et al.
Published: (2025)
by: Tang, Yuanbo, et al.
Published: (2025)
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
by: Lu, Xudong, et al.
Published: (2024)
by: Lu, Xudong, et al.
Published: (2024)
Pruning and Distilling Mixture-of-Experts into Dense Language Models
by: Kim, Junhyuck, et al.
Published: (2026)
by: Kim, Junhyuck, et al.
Published: (2026)
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
by: Pan, Bowen, et al.
Published: (2024)
by: Pan, Bowen, et al.
Published: (2024)
Prediction-powered Inference by Mixture of Experts
by: Gu, Yanwu, et al.
Published: (2026)
by: Gu, Yanwu, et al.
Published: (2026)
RedCoast: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs
by: Tan, Bowen, et al.
Published: (2023)
by: Tan, Bowen, et al.
Published: (2023)
Towards Optimizing with Large Language Models
by: Guo, Pei-Fu, et al.
Published: (2023)
by: Guo, Pei-Fu, et al.
Published: (2023)
BuddyMoE: Exploiting Expert Redundancy to Accelerate Memory-Constrained Mixture-of-Experts Inference
by: Wang, Yun, et al.
Published: (2025)
by: Wang, Yun, et al.
Published: (2025)
Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts
by: Zhang, Di, et al.
Published: (2025)
by: Zhang, Di, et al.
Published: (2025)
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models
by: Kim, Gyeongman, et al.
Published: (2025)
by: Kim, Gyeongman, et al.
Published: (2025)
DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models
by: Aghdam, Maryam Akhavan, et al.
Published: (2024)
by: Aghdam, Maryam Akhavan, et al.
Published: (2024)
On Optimizing the Communication of Model Parallelism
by: Zhuang, Yonghao, et al.
Published: (2022)
by: Zhuang, Yonghao, et al.
Published: (2022)
WDMoE: Wireless Distributed Large Language Models with Mixture of Experts
by: Xue, Nan, et al.
Published: (2024)
by: Xue, Nan, et al.
Published: (2024)
Accelerating Mixture-of-Expert Inference with Adaptive Expert Split Mechanism
by: Yan, Jiaming, et al.
Published: (2025)
by: Yan, Jiaming, et al.
Published: (2025)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
by: Zhao, Zhongyu, et al.
Published: (2024)
by: Zhao, Zhongyu, et al.
Published: (2024)
HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts
by: He, Neil, et al.
Published: (2025)
by: He, Neil, et al.
Published: (2025)
Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching
by: Yun, Sungmin, et al.
Published: (2024)
by: Yun, Sungmin, et al.
Published: (2024)
EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models
by: Chen, Yuanteng, et al.
Published: (2025)
by: Chen, Yuanteng, et al.
Published: (2025)
GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
by: Zeng, Chao, et al.
Published: (2024)
by: Zeng, Chao, et al.
Published: (2024)
SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?
by: Zhuang, Haomin, et al.
Published: (2024)
by: Zhuang, Haomin, et al.
Published: (2024)
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
by: Wang, Zihan, et al.
Published: (2025)
by: Wang, Zihan, et al.
Published: (2025)
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models
by: Huang, Yongqi, et al.
Published: (2025)
by: Huang, Yongqi, et al.
Published: (2025)
Similar Items
-
Mixture of Experts in Large Language Models
by: Zhang, Danyang, et al.
Published: (2025) -
Towards Faster Language Model Inference Using Mixture-of-Experts Flow Matching
by: Li, Aihua
Published: (2026) -
A Closer Look into Mixture-of-Experts in Large Language Models
by: Lo, Ka Man, et al.
Published: (2024) -
Bayesian Mixture of Experts For Large Language Models
by: Dialameh, Maryam, et al.
Published: (2025) -
Towards Principled Design of Mixture-of-Experts Language Models under Memory and Inference Constraints
by: Liew, Seng Pei, et al.
Published: (2026)