Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Guowei, Li, Hongming, Guo, Yaning, Lyu, Yongxi, Zhou, Mo, Liu, Yi, Li, Zhaogeng, Wang, Yanpeng
Format:	Preprint
Published:	2026
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2602.09721
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Deploying large-scale MoE models presents challenges in memory capacity and bandwidth for expert activation. While Attention-FFN Disaggregation (AFD) has emerged as a potential architecture to decouple compute and memory resources, its performance boundaries compared to standard large-scale Expert Parallelism (EP) remain underexplored. In this paper, we conduct a systematic analysis of AFD by extending the roofline model to the communication level, correlating interconnect bandwidth, arithmetic intensity, and Hardware FLOPS Utilization (HFU). Our analysis reveals a dead zone on standard clusters: increasing FFN instance count fails to improve HFU as computational workload is capped by scale-out bandwidth, causing operator active time to shrink relative to the fixed latency budget. We further show that AFD's discrete node-level scaling incurs higher imbalance penalties than EP's continuous batch adjustment. Nevertheless, these limitations diminish under specific conditions: Superpod-class hardware with abundant interconnect bandwidth and models with coarse-grained experts and lower sparsity are more likely to benefit from AFD. These findings position AFD as a promising approach for specific hardware-model combinations rather than a universal solution.

Similar Items