Saved in:
Bibliographic Details
Main Authors: Su, Ye, Tang, Huayi, Gong, Zixuan, Liu, Yong
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.03204
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911651268132864
author Su, Ye
Tang, Huayi
Gong, Zixuan
Liu, Yong
author_facet Su, Ye
Tang, Huayi
Gong, Zixuan
Liu, Yong
contents While Mixture-of-Experts (MoE) architectures define the state-of-the-art, their theoretical success is often attributed to heuristic efficiency rather than geometric expressivity. In this work, we present the first analysis of MoE through the lens of tropical geometry, establishing that the Top-$k$ routing mechanism is algebraically isomorphic to the $k$-th elementary symmetric tropical polynomial. This isomorphism partitions the input space into the Normal Fan of a Hypersimplex, revealing that \textbf{sparsity is combinatorial depth} which scales geometric capacity by the binomial coefficient $\binom{N}{k}$. Moving beyond ambient bounds, we introduce the concept of \textit{Effective Capacity} under the Manifold Hypothesis. We prove that while dense networks suffer from capacity collapse on low-dimensional data, MoE architectures exhibit \textit{Combinatorial Resilience}, maintaining high expressivity via the transversality of routing cones. Translating these theoretical bounds into architectural principles, we derive asymptotic capacity limits for optimal expert granularity and prove that shared experts are geometrically necessary to prevent routing collapse.
format Preprint
id arxiv_https___arxiv_org_abs_2602_03204
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry
Su, Ye
Tang, Huayi
Gong, Zixuan
Liu, Yong
Machine Learning
While Mixture-of-Experts (MoE) architectures define the state-of-the-art, their theoretical success is often attributed to heuristic efficiency rather than geometric expressivity. In this work, we present the first analysis of MoE through the lens of tropical geometry, establishing that the Top-$k$ routing mechanism is algebraically isomorphic to the $k$-th elementary symmetric tropical polynomial. This isomorphism partitions the input space into the Normal Fan of a Hypersimplex, revealing that \textbf{sparsity is combinatorial depth} which scales geometric capacity by the binomial coefficient $\binom{N}{k}$. Moving beyond ambient bounds, we introduce the concept of \textit{Effective Capacity} under the Manifold Hypothesis. We prove that while dense networks suffer from capacity collapse on low-dimensional data, MoE architectures exhibit \textit{Combinatorial Resilience}, maintaining high expressivity via the transversality of routing cones. Translating these theoretical bounds into architectural principles, we derive asymptotic capacity limits for optimal expert granularity and prove that shared experts are geometrically necessary to prevent routing collapse.
title Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry
topic Machine Learning
url https://arxiv.org/abs/2602.03204