Saved in:
| Main Authors: | Yu, Dianhai, Shen, Liang, Hao, Hongxiang, Gong, Weibao, Wu, Huachao, Bian, Jiang, Dai, Lirong, Xiong, Haoyi |
|---|---|
| Format: | Preprint |
| Published: |
2022
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2205.10034 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hexa-MoE: Efficient and Heterogeneous-aware Training for Mixture-of-Experts
by: Luo, Shuqing, et al.
Published: (2024)
by: Luo, Shuqing, et al.
Published: (2024)
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
by: Tairin, Suraiya, et al.
Published: (2025)
by: Tairin, Suraiya, et al.
Published: (2025)
ElasticMoE: An Efficient Auto Scaling Method for Mixture-of-Experts Models
by: Singh, Gursimran, et al.
Published: (2025)
by: Singh, Gursimran, et al.
Published: (2025)
FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training
by: Gao, Yunqi, et al.
Published: (2025)
by: Gao, Yunqi, et al.
Published: (2025)
Stable-MoE: Lyapunov-based Token Routing for Distributed Mixture-of-Experts Training over Edge Networks
by: Shi, Long, et al.
Published: (2025)
by: Shi, Long, et al.
Published: (2025)
HAP: Hybrid Adaptive Parallelism for Efficient Mixture-of-Experts Inference
by: Lin, Haoran, et al.
Published: (2025)
by: Lin, Haoran, et al.
Published: (2025)
LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training
by: Liu, Xinyi, et al.
Published: (2026)
by: Liu, Xinyi, et al.
Published: (2026)
HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs
by: Wu, Yongji, et al.
Published: (2025)
by: Wu, Yongji, et al.
Published: (2025)
OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference
by: Wang, Liujianfu, et al.
Published: (2025)
by: Wang, Liujianfu, et al.
Published: (2025)
Accelerating Distributed MoE Training and Inference with Lina
by: Li, Jiamin, et al.
Published: (2022)
by: Li, Jiamin, et al.
Published: (2022)
MoC-System: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training
by: Cai, Weilin, et al.
Published: (2024)
by: Cai, Weilin, et al.
Published: (2024)
EC2MoE: Adaptive End-Cloud Pipeline Collaboration Enabling Scalable Mixture-of-Experts Inference
by: Yang, Zheming, et al.
Published: (2025)
by: Yang, Zheming, et al.
Published: (2025)
Accelerating Edge Inference for Distributed MoE Models with Latency-Optimized Expert Placement
by: Wu, Tian, et al.
Published: (2025)
by: Wu, Tian, et al.
Published: (2025)
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
by: Jin, Chao, et al.
Published: (2025)
by: Jin, Chao, et al.
Published: (2025)
ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference
by: Shen, Zixu, et al.
Published: (2025)
by: Shen, Zixu, et al.
Published: (2025)
Distributed Generative Inference of LLM at Internet Scales with Multi-Dimensional Communication Optimization
by: Chen, Jiu, et al.
Published: (2026)
by: Chen, Jiu, et al.
Published: (2026)
Expert-as-a-Service: Towards Efficient, Scalable, and Robust Large-scale MoE Serving
by: Liu, Ziming, et al.
Published: (2025)
by: Liu, Ziming, et al.
Published: (2025)
Janus: Disaggregating Attention and Experts for Scalable MoE Inference
by: Zhang, Zhexiang, et al.
Published: (2025)
by: Zhang, Zhexiang, et al.
Published: (2025)
GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference
by: Han, Yu, et al.
Published: (2025)
by: Han, Yu, et al.
Published: (2025)
Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding
by: Wang, Zhibin, et al.
Published: (2025)
by: Wang, Zhibin, et al.
Published: (2025)
Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing
by: Liu, Mengfan, et al.
Published: (2025)
by: Liu, Mengfan, et al.
Published: (2025)
Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony
by: Wang, Shaoyu, et al.
Published: (2025)
by: Wang, Shaoyu, et al.
Published: (2025)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
by: Qian, Yulei, et al.
Published: (2024)
by: Qian, Yulei, et al.
Published: (2024)
Surviving Partial Rank Failures in Wide Expert-Parallel MoE Inference
by: Sun, Xun, et al.
Published: (2026)
by: Sun, Xun, et al.
Published: (2026)
SpaceMoE: Realizing Distributed Mixture-of-Experts Inference over Space Networks
by: Wang, Zhanwei, et al.
Published: (2026)
by: Wang, Zhanwei, et al.
Published: (2026)
Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism
by: Pan, Xinglin, et al.
Published: (2025)
by: Pan, Xinglin, et al.
Published: (2025)
HarMoEny: Efficient Multi-GPU Inference of MoE Models
by: Doucet, Zachary, et al.
Published: (2025)
by: Doucet, Zachary, et al.
Published: (2025)
SYMI: Efficient Mixture-of-Experts Training via Model and Optimizer State Decoupling
by: Skiadopoulos, Athinagoras, et al.
Published: (2025)
by: Skiadopoulos, Athinagoras, et al.
Published: (2025)
MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing
by: Go, Seokjin, et al.
Published: (2025)
by: Go, Seokjin, et al.
Published: (2025)
Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge
by: Qin, Shengling, et al.
Published: (2025)
by: Qin, Shengling, et al.
Published: (2025)
Scattered Mixture-of-Experts Implementation
by: Tan, Shawn, et al.
Published: (2024)
by: Tan, Shawn, et al.
Published: (2024)
MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference
by: Tang, Xinru, et al.
Published: (2025)
by: Tang, Xinru, et al.
Published: (2025)
UniEP: Unified Expert-Parallel MoE MegaKernel for LLM Training
by: Zheng, Size, et al.
Published: (2026)
by: Zheng, Size, et al.
Published: (2026)
Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference
by: Luo, Shuqing, et al.
Published: (2025)
by: Luo, Shuqing, et al.
Published: (2025)
MoDM: Efficient Serving for Image Generation via Mixture-of-Diffusion Models
by: Xia, Yuchen, et al.
Published: (2025)
by: Xia, Yuchen, et al.
Published: (2025)
DuoServe-MoE: Dual-Phase Expert Prefetch and Caching for LLM Inference QoS Assurance
by: Zhang, Yuning, et al.
Published: (2025)
by: Zhang, Yuning, et al.
Published: (2025)
ExpertWeave: Efficiently Serving Expert-Specialized Fine-Tuned Adapters at Scale
by: Shi, Ge, et al.
Published: (2025)
by: Shi, Ge, et al.
Published: (2025)
Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models
by: Wu, Yongji, et al.
Published: (2024)
by: Wu, Yongji, et al.
Published: (2024)
Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection
by: Gupta, Vima, et al.
Published: (2024)
by: Gupta, Vima, et al.
Published: (2024)
MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts
by: Wang, Wenfeng, et al.
Published: (2025)
by: Wang, Wenfeng, et al.
Published: (2025)
Similar Items
-
Hexa-MoE: Efficient and Heterogeneous-aware Training for Mixture-of-Experts
by: Luo, Shuqing, et al.
Published: (2024) -
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
by: Tairin, Suraiya, et al.
Published: (2025) -
ElasticMoE: An Efficient Auto Scaling Method for Mixture-of-Experts Models
by: Singh, Gursimran, et al.
Published: (2025) -
FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training
by: Gao, Yunqi, et al.
Published: (2025) -
Stable-MoE: Lyapunov-based Token Routing for Distributed Mixture-of-Experts Training over Edge Networks
by: Shi, Long, et al.
Published: (2025)