Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Falke, Tobias, Anastassacos, Nicolas, Tan, Samson, Meas, Chankrisna Richy, Prakash, Chandana Satya, Sekhar, Nitesh, Bari, M Saiful, Kompella, Krishna, Elsayed, Gamaleldin F.
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2604.07030
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918434819801088
author	Falke, Tobias Anastassacos, Nicolas Tan, Samson Meas, Chankrisna Richy Prakash, Chandana Satya Sekhar, Nitesh Bari, M Saiful Kompella, Krishna Elsayed, Gamaleldin F.
author_facet	Falke, Tobias Anastassacos, Nicolas Tan, Samson Meas, Chankrisna Richy Prakash, Chandana Satya Sekhar, Nitesh Bari, M Saiful Kompella, Krishna Elsayed, Gamaleldin F.
contents	Sparse Mixture-of-Experts (MoE) architectures are increasingly popular for frontier large language models (LLM) but they introduce training challenges due to routing complexity. Fully leveraging parameters of an MoE model requires all experts to be well-trained and to specialize in non-redundant ways. Assessing this, however, is complicated due to lack of established metrics and, importantly, many routing techniques exhibit similar performance at smaller sizes, which is often not reflective of their behavior at large scale. To address this challenge, we propose the MoE Routing Testbed, a setup that gives clearer visibility into routing dynamics at small scale while using realistic data. The testbed pairs a data mix with clearly distinguishable domains with a reference router that prescribes ideal routing based on these domains, providing a well-defined upper bound for comparison. This enables quantifiable measurement of expert specialization. To demonstrate the value of the testbed, we compare various MoE routing approaches and show that balancing scope is the crucial factor that allows specialization while maintaining high expert utilization. We confirm that this observation generalizes to models 35x larger.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_07030
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	MoE Routing Testbed: Studying Expert Specialization and Routing Behavior at Small Scale Falke, Tobias Anastassacos, Nicolas Tan, Samson Meas, Chankrisna Richy Prakash, Chandana Satya Sekhar, Nitesh Bari, M Saiful Kompella, Krishna Elsayed, Gamaleldin F. Machine Learning Sparse Mixture-of-Experts (MoE) architectures are increasingly popular for frontier large language models (LLM) but they introduce training challenges due to routing complexity. Fully leveraging parameters of an MoE model requires all experts to be well-trained and to specialize in non-redundant ways. Assessing this, however, is complicated due to lack of established metrics and, importantly, many routing techniques exhibit similar performance at smaller sizes, which is often not reflective of their behavior at large scale. To address this challenge, we propose the MoE Routing Testbed, a setup that gives clearer visibility into routing dynamics at small scale while using realistic data. The testbed pairs a data mix with clearly distinguishable domains with a reference router that prescribes ideal routing based on these domains, providing a well-defined upper bound for comparison. This enables quantifiable measurement of expert specialization. To demonstrate the value of the testbed, we compare various MoE routing approaches and show that balancing scope is the crucial factor that allows specialization while maintaining high expert utilization. We confirm that this observation generalizes to models 35x larger.
title	MoE Routing Testbed: Studying Expert Specialization and Routing Behavior at Small Scale
topic	Machine Learning
url	https://arxiv.org/abs/2604.07030

Similar Items