:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yu, Dianhai, Shen, Liang, Hao, Hongxiang, Gong, Weibao, Wu, Huachao, Bian, Jiang, Dai, Lirong, Xiong, Haoyi
Format:	Preprint
Published:	2022
Subjects:	Distributed, Parallel, and Cluster Computing Artificial Intelligence
Online Access:	https://arxiv.org/abs/2205.10034
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Hexa-MoE: Efficient and Heterogeneous-aware Training for Mixture-of-Experts
by: Luo, Shuqing, et al.
Published: (2024)

eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
by: Tairin, Suraiya, et al.
Published: (2025)

ElasticMoE: An Efficient Auto Scaling Method for Mixture-of-Experts Models
by: Singh, Gursimran, et al.
Published: (2025)

FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training
by: Gao, Yunqi, et al.
Published: (2025)

Stable-MoE: Lyapunov-based Token Routing for Distributed Mixture-of-Experts Training over Edge Networks
by: Shi, Long, et al.
Published: (2025)

HAP: Hybrid Adaptive Parallelism for Efficient Mixture-of-Experts Inference
by: Lin, Haoran, et al.
Published: (2025)

LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training
by: Liu, Xinyi, et al.
Published: (2026)

HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs
by: Wu, Yongji, et al.
Published: (2025)

OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference
by: Wang, Liujianfu, et al.
Published: (2025)

Accelerating Distributed MoE Training and Inference with Lina
by: Li, Jiamin, et al.
Published: (2022)

MoC-System: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training
by: Cai, Weilin, et al.
Published: (2024)

EC2MoE: Adaptive End-Cloud Pipeline Collaboration Enabling Scalable Mixture-of-Experts Inference
by: Yang, Zheming, et al.
Published: (2025)

Accelerating Edge Inference for Distributed MoE Models with Latency-Optimized Expert Placement
by: Wu, Tian, et al.
Published: (2025)

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
by: Jin, Chao, et al.
Published: (2025)

ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference
by: Shen, Zixu, et al.
Published: (2025)

Distributed Generative Inference of LLM at Internet Scales with Multi-Dimensional Communication Optimization
by: Chen, Jiu, et al.
Published: (2026)

Expert-as-a-Service: Towards Efficient, Scalable, and Robust Large-scale MoE Serving
by: Liu, Ziming, et al.
Published: (2025)

Janus: Disaggregating Attention and Experts for Scalable MoE Inference
by: Zhang, Zhexiang, et al.
Published: (2025)

GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference
by: Han, Yu, et al.
Published: (2025)

Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding
by: Wang, Zhibin, et al.
Published: (2025)

Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing
by: Liu, Mengfan, et al.
Published: (2025)

Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony
by: Wang, Shaoyu, et al.
Published: (2025)

EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
by: Qian, Yulei, et al.
Published: (2024)

Surviving Partial Rank Failures in Wide Expert-Parallel MoE Inference
by: Sun, Xun, et al.
Published: (2026)

SpaceMoE: Realizing Distributed Mixture-of-Experts Inference over Space Networks
by: Wang, Zhanwei, et al.
Published: (2026)

Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism
by: Pan, Xinglin, et al.
Published: (2025)

HarMoEny: Efficient Multi-GPU Inference of MoE Models
by: Doucet, Zachary, et al.
Published: (2025)

SYMI: Efficient Mixture-of-Experts Training via Model and Optimizer State Decoupling
by: Skiadopoulos, Athinagoras, et al.
Published: (2025)

MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing
by: Go, Seokjin, et al.
Published: (2025)

Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge
by: Qin, Shengling, et al.
Published: (2025)

Scattered Mixture-of-Experts Implementation
by: Tan, Shawn, et al.
Published: (2024)

MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference
by: Tang, Xinru, et al.
Published: (2025)

UniEP: Unified Expert-Parallel MoE MegaKernel for LLM Training
by: Zheng, Size, et al.
Published: (2026)

Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference
by: Luo, Shuqing, et al.
Published: (2025)

MoDM: Efficient Serving for Image Generation via Mixture-of-Diffusion Models
by: Xia, Yuchen, et al.
Published: (2025)

DuoServe-MoE: Dual-Phase Expert Prefetch and Caching for LLM Inference QoS Assurance
by: Zhang, Yuning, et al.
Published: (2025)

ExpertWeave: Efficiently Serving Expert-Specialized Fine-Tuned Adapters at Scale
by: Shi, Ge, et al.
Published: (2025)

Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models
by: Wu, Yongji, et al.
Published: (2024)

Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection
by: Gupta, Vima, et al.
Published: (2024)

MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts
by: Wang, Wenfeng, et al.
Published: (2025)