:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Lu, Shi, Rong, Zhang, Shaoqing, Su, Shangchao, Yin, Ziqing, Cui, Zhiyan, Sun, Hongfeng, He, Baoguo, Chen, Yueqiang, Dong, Liang, Li, Xiyuan, Wang, Lingbin, Ma, Lijun, Huang, Qiang, Liu, Ting, Wang, Chong, Wei, Can
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2511.09837
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training
by: Zhao, Lu, et al.
Published: (2025)

Disaggregated Prefill and Decoding Inference System for Large Language Model Serving on Multi-Vendor GPUs
by: Chen, Xing, et al.
Published: (2025)

Decentralized Proactive Model Offloading and Resource Allocation for Split and Federated Learning
by: Huang, Binbin, et al.
Published: (2024)

ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments
by: Li, Haley, et al.
Published: (2026)

FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients
by: Su, Shangchao, et al.
Published: (2023)

ReaLB: Real-Time Load Balancing for Multimodal MoE Inference
by: Wang, Yingping, et al.
Published: (2026)

UniEP: Unified Expert-Parallel MoE MegaKernel for LLM Training
by: Zheng, Size, et al.
Published: (2026)

LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing
by: Nie, Xiaonan, et al.
Published: (2024)

OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference
by: Wang, Liujianfu, et al.
Published: (2025)

HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments
by: He, Yongjun, et al.
Published: (2025)

Accelerating Distributed MoE Training and Inference with Lina
by: Li, Jiamin, et al.
Published: (2022)

When MoE Meets Blockchain: A Trustworthy Distributed Framework of Large Models
by: Zhu, Weihao, et al.
Published: (2025)

MOSS: A Large-scale Open Microscopic Traffic Simulation System
by: Zhang, Jun, et al.
Published: (2024)

ProMoE: Fast MoE-based LLM Serving using Proactive Caching
by: Song, Xiaoniu, et al.
Published: (2024)

Janus: Disaggregating Attention and Experts for Scalable MoE Inference
by: Zhang, Zhexiang, et al.
Published: (2025)

Hexa-MoE: Efficient and Heterogeneous-aware Training for Mixture-of-Experts
by: Luo, Shuqing, et al.
Published: (2024)

Staleness-Centric Optimizations for Parallel Diffusion MoE Inference
by: Luo, Jiajun, et al.
Published: (2024)

Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend
by: Hu, Tianlun, et al.
Published: (2026)

BanaServe: Unified KV Cache and Dynamic Module Migration for Balancing Disaggregated LLM Serving in AI Infrastructure
by: He, Yiyuan, et al.
Published: (2025)

HarMoEny: Efficient Multi-GPU Inference of MoE Models
by: Doucet, Zachary, et al.
Published: (2025)

Joint Temporal-Structural Representation Learning for Distributed Fault Discrimination in Microservice Architectures
by: Xue, Yihan, et al.
Published: (2026)

Local Gradient Regulation Stabilizes Federated Learning under Client Heterogeneity
by: Luo, Ping, et al.
Published: (2026)

Radio Labeling of Strong Prismatic Network With Star
by: Wang, Liming, et al.
Published: (2026)

SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference
by: Chen, Liangkun, et al.
Published: (2025)

Revealing the Challenges of Attention-FFN Disaggregation for Modern MoE Models and Hardware Systems
by: Liu, Guowei, et al.
Published: (2026)

Accelerating Edge Inference for Distributed MoE Models with Latency-Optimized Expert Placement
by: Wu, Tian, et al.
Published: (2025)

Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-Hopper
by: Li, Junjie, et al.
Published: (2024)

GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference
by: Han, Yu, et al.
Published: (2025)

MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism
by: Zhang, Zheng, et al.
Published: (2025)

Argus: Token Aware Distributed LLM Inference Optimization
by: Wu, Panlong, et al.
Published: (2025)

Stable-MoE: Lyapunov-based Token Routing for Distributed Mixture-of-Experts Training over Edge Networks
by: Shi, Long, et al.
Published: (2025)

Prefill-Decode Aggregation or Disaggregation? Unifying Both for Goodput-Optimized LLM Serving
by: Wang, Chao, et al.
Published: (2025)

FedFa: A Fully Asynchronous Training Paradigm for Federated Learning
by: Xu, Haotian, et al.
Published: (2024)

MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference
by: Tang, Xinru, et al.
Published: (2025)

DiFache: Efficient and Scalable Caching on Disaggregated Memory using Decentralized Coherence
by: Zhang, Hanze, et al.
Published: (2025)

Unifying Partial Synchrony
by: Constantinescu, Andrei, et al.
Published: (2024)

Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models
by: Wang, Wei, et al.
Published: (2024)

LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training
by: Liu, Xinyi, et al.
Published: (2026)

Taurus Database: How to be Fast, Available, and Frugal in the Cloud
by: Depoutovitch, Alex, et al.
Published: (2024)

Federated Learning with Bilateral Curation for Partially Class-Disjoint Data
by: Fan, Ziqing, et al.
Published: (2024)