Saved in:
| Main Authors: | Zhao, Lu, Shi, Rong, Zhang, Shaoqing, Su, Shangchao, Yin, Ziqing, Cui, Zhiyan, Sun, Hongfeng, He, Baoguo, Chen, Yueqiang, Dong, Liang, Li, Xiyuan, Wang, Lingbin, Ma, Lijun, Huang, Qiang, Liu, Ting, Wang, Chong, Wei, Can |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.09837 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training
by: Zhao, Lu, et al.
Published: (2025)
by: Zhao, Lu, et al.
Published: (2025)
Disaggregated Prefill and Decoding Inference System for Large Language Model Serving on Multi-Vendor GPUs
by: Chen, Xing, et al.
Published: (2025)
by: Chen, Xing, et al.
Published: (2025)
Decentralized Proactive Model Offloading and Resource Allocation for Split and Federated Learning
by: Huang, Binbin, et al.
Published: (2024)
by: Huang, Binbin, et al.
Published: (2024)
ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments
by: Li, Haley, et al.
Published: (2026)
by: Li, Haley, et al.
Published: (2026)
FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients
by: Su, Shangchao, et al.
Published: (2023)
by: Su, Shangchao, et al.
Published: (2023)
ReaLB: Real-Time Load Balancing for Multimodal MoE Inference
by: Wang, Yingping, et al.
Published: (2026)
by: Wang, Yingping, et al.
Published: (2026)
UniEP: Unified Expert-Parallel MoE MegaKernel for LLM Training
by: Zheng, Size, et al.
Published: (2026)
by: Zheng, Size, et al.
Published: (2026)
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing
by: Nie, Xiaonan, et al.
Published: (2024)
by: Nie, Xiaonan, et al.
Published: (2024)
OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference
by: Wang, Liujianfu, et al.
Published: (2025)
by: Wang, Liujianfu, et al.
Published: (2025)
HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments
by: He, Yongjun, et al.
Published: (2025)
by: He, Yongjun, et al.
Published: (2025)
Accelerating Distributed MoE Training and Inference with Lina
by: Li, Jiamin, et al.
Published: (2022)
by: Li, Jiamin, et al.
Published: (2022)
When MoE Meets Blockchain: A Trustworthy Distributed Framework of Large Models
by: Zhu, Weihao, et al.
Published: (2025)
by: Zhu, Weihao, et al.
Published: (2025)
MOSS: A Large-scale Open Microscopic Traffic Simulation System
by: Zhang, Jun, et al.
Published: (2024)
by: Zhang, Jun, et al.
Published: (2024)
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
by: Song, Xiaoniu, et al.
Published: (2024)
by: Song, Xiaoniu, et al.
Published: (2024)
Janus: Disaggregating Attention and Experts for Scalable MoE Inference
by: Zhang, Zhexiang, et al.
Published: (2025)
by: Zhang, Zhexiang, et al.
Published: (2025)
Hexa-MoE: Efficient and Heterogeneous-aware Training for Mixture-of-Experts
by: Luo, Shuqing, et al.
Published: (2024)
by: Luo, Shuqing, et al.
Published: (2024)
Staleness-Centric Optimizations for Parallel Diffusion MoE Inference
by: Luo, Jiajun, et al.
Published: (2024)
by: Luo, Jiajun, et al.
Published: (2024)
Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend
by: Hu, Tianlun, et al.
Published: (2026)
by: Hu, Tianlun, et al.
Published: (2026)
BanaServe: Unified KV Cache and Dynamic Module Migration for Balancing Disaggregated LLM Serving in AI Infrastructure
by: He, Yiyuan, et al.
Published: (2025)
by: He, Yiyuan, et al.
Published: (2025)
HarMoEny: Efficient Multi-GPU Inference of MoE Models
by: Doucet, Zachary, et al.
Published: (2025)
by: Doucet, Zachary, et al.
Published: (2025)
Joint Temporal-Structural Representation Learning for Distributed Fault Discrimination in Microservice Architectures
by: Xue, Yihan, et al.
Published: (2026)
by: Xue, Yihan, et al.
Published: (2026)
Local Gradient Regulation Stabilizes Federated Learning under Client Heterogeneity
by: Luo, Ping, et al.
Published: (2026)
by: Luo, Ping, et al.
Published: (2026)
Radio Labeling of Strong Prismatic Network With Star
by: Wang, Liming, et al.
Published: (2026)
by: Wang, Liming, et al.
Published: (2026)
SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference
by: Chen, Liangkun, et al.
Published: (2025)
by: Chen, Liangkun, et al.
Published: (2025)
Revealing the Challenges of Attention-FFN Disaggregation for Modern MoE Models and Hardware Systems
by: Liu, Guowei, et al.
Published: (2026)
by: Liu, Guowei, et al.
Published: (2026)
Accelerating Edge Inference for Distributed MoE Models with Latency-Optimized Expert Placement
by: Wu, Tian, et al.
Published: (2025)
by: Wu, Tian, et al.
Published: (2025)
Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-Hopper
by: Li, Junjie, et al.
Published: (2024)
by: Li, Junjie, et al.
Published: (2024)
GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference
by: Han, Yu, et al.
Published: (2025)
by: Han, Yu, et al.
Published: (2025)
MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism
by: Zhang, Zheng, et al.
Published: (2025)
by: Zhang, Zheng, et al.
Published: (2025)
Argus: Token Aware Distributed LLM Inference Optimization
by: Wu, Panlong, et al.
Published: (2025)
by: Wu, Panlong, et al.
Published: (2025)
Stable-MoE: Lyapunov-based Token Routing for Distributed Mixture-of-Experts Training over Edge Networks
by: Shi, Long, et al.
Published: (2025)
by: Shi, Long, et al.
Published: (2025)
Prefill-Decode Aggregation or Disaggregation? Unifying Both for Goodput-Optimized LLM Serving
by: Wang, Chao, et al.
Published: (2025)
by: Wang, Chao, et al.
Published: (2025)
FedFa: A Fully Asynchronous Training Paradigm for Federated Learning
by: Xu, Haotian, et al.
Published: (2024)
by: Xu, Haotian, et al.
Published: (2024)
MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference
by: Tang, Xinru, et al.
Published: (2025)
by: Tang, Xinru, et al.
Published: (2025)
DiFache: Efficient and Scalable Caching on Disaggregated Memory using Decentralized Coherence
by: Zhang, Hanze, et al.
Published: (2025)
by: Zhang, Hanze, et al.
Published: (2025)
Unifying Partial Synchrony
by: Constantinescu, Andrei, et al.
Published: (2024)
by: Constantinescu, Andrei, et al.
Published: (2024)
Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models
by: Wang, Wei, et al.
Published: (2024)
by: Wang, Wei, et al.
Published: (2024)
LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training
by: Liu, Xinyi, et al.
Published: (2026)
by: Liu, Xinyi, et al.
Published: (2026)
Taurus Database: How to be Fast, Available, and Frugal in the Cloud
by: Depoutovitch, Alex, et al.
Published: (2024)
by: Depoutovitch, Alex, et al.
Published: (2024)
Federated Learning with Bilateral Curation for Partially Class-Disjoint Data
by: Fan, Ziqing, et al.
Published: (2024)
by: Fan, Ziqing, et al.
Published: (2024)
Similar Items
-
MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training
by: Zhao, Lu, et al.
Published: (2025) -
Disaggregated Prefill and Decoding Inference System for Large Language Model Serving on Multi-Vendor GPUs
by: Chen, Xing, et al.
Published: (2025) -
Decentralized Proactive Model Offloading and Resource Allocation for Split and Federated Learning
by: Huang, Binbin, et al.
Published: (2024) -
ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments
by: Li, Haley, et al.
Published: (2026) -
FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients
by: Su, Shangchao, et al.
Published: (2023)