:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yun, Longfei, Zhuang, Yonghao, Fu, Yao, Xing, Eric P, Zhang, Hao
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2404.02852
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Mixture of Experts in Large Language Models
by: Zhang, Danyang, et al.
Published: (2025)

Towards Faster Language Model Inference Using Mixture-of-Experts Flow Matching
by: Li, Aihua
Published: (2026)

A Closer Look into Mixture-of-Experts in Large Language Models
by: Lo, Ka Man, et al.
Published: (2024)

Bayesian Mixture of Experts For Large Language Models
by: Dialameh, Maryam, et al.
Published: (2025)

Towards Principled Design of Mixture-of-Experts Language Models under Memory and Inference Constraints
by: Liew, Seng Pei, et al.
Published: (2026)

MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)

HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts
by: Zhao, Hao, et al.
Published: (2024)

Efficient Long-context Language Model Training by Core Attention Disaggregation
by: Zhuang, Yonghao, et al.
Published: (2025)

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing
by: Zheng, Wenhao, et al.
Published: (2025)

Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
by: Liu, Enshu, et al.
Published: (2024)

Upcycling Large Language Models into Mixture of Experts
by: He, Ethan, et al.
Published: (2024)

MC#: Mixture Compressor for Mixture-of-Experts Large Models
by: Huang, Wei, et al.
Published: (2025)

A Survey on Mixture of Experts in Large Language Models
by: Cai, Weilin, et al.
Published: (2024)

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference
by: Yao, Jinghan, et al.
Published: (2024)

Speculating Experts Accelerates Inference for Mixture-of-Experts
by: Madan, Vivan, et al.
Published: (2026)

MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts
by: Xie, Zhitian, et al.
Published: (2024)

Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference
by: Chu, Kexin, et al.
Published: (2025)

MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
by: Hu, Xing, et al.
Published: (2025)

Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models
by: Tang, Yuanbo, et al.
Published: (2025)

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
by: Lu, Xudong, et al.
Published: (2024)

Pruning and Distilling Mixture-of-Experts into Dense Language Models
by: Kim, Junhyuck, et al.
Published: (2026)

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
by: Pan, Bowen, et al.
Published: (2024)

Prediction-powered Inference by Mixture of Experts
by: Gu, Yanwu, et al.
Published: (2026)

RedCoast: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs
by: Tan, Bowen, et al.
Published: (2023)

Towards Optimizing with Large Language Models
by: Guo, Pei-Fu, et al.
Published: (2023)

BuddyMoE: Exploiting Expert Redundancy to Accelerate Memory-Constrained Mixture-of-Experts Inference
by: Wang, Yun, et al.
Published: (2025)

Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts
by: Zhang, Di, et al.
Published: (2025)

Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models
by: Kim, Gyeongman, et al.
Published: (2025)

DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models
by: Aghdam, Maryam Akhavan, et al.
Published: (2024)

On Optimizing the Communication of Model Parallelism
by: Zhuang, Yonghao, et al.
Published: (2022)

WDMoE: Wireless Distributed Large Language Models with Mixture of Experts
by: Xue, Nan, et al.
Published: (2024)

Accelerating Mixture-of-Expert Inference with Adaptive Expert Split Mechanism
by: Yan, Jiaming, et al.
Published: (2025)

FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
by: Zhao, Zhongyu, et al.
Published: (2024)

HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts
by: He, Neil, et al.
Published: (2025)

Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching
by: Yun, Sungmin, et al.
Published: (2024)

EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models
by: Chen, Yuanteng, et al.
Published: (2025)

GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
by: Zeng, Chao, et al.
Published: (2024)

SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?
by: Zhuang, Haomin, et al.
Published: (2024)

Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
by: Wang, Zihan, et al.
Published: (2025)

DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models
by: Huang, Yongqi, et al.
Published: (2025)