Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Dar, Yehuda
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2510.03151
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914072735252480
author	Dar, Yehuda
author_facet	Dar, Yehuda
contents	This paper uses classical high-rate quantization theory to provide new insights into mixture-of-experts (MoE) models for regression tasks. Our MoE is defined by a segmentation of the input space to regions, each with a single-parameter expert that acts as a constant predictor with zero-compute at inference. Motivated by high-rate quantization theory assumptions, we assume that the number of experts is sufficiently large to make their input-space regions very small. This lets us to study the approximation error of our MoE model class: (i) for one-dimensional inputs, we formulate the test error and its minimizing segmentation and experts; (ii) for multidimensional inputs, we formulate an upper bound for the test error and study its minimization. Moreover, we consider the learning of the expert parameters from a training dataset, given an input-space segmentation, and formulate their statistical learning properties. This leads us to theoretically and empirically show how the tradeoff between approximation and estimation errors in MoE learning depends on the number of experts.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_03151
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Mixture of Many Zero-Compute Experts: A High-Rate Quantization Theory Perspective Dar, Yehuda Machine Learning This paper uses classical high-rate quantization theory to provide new insights into mixture-of-experts (MoE) models for regression tasks. Our MoE is defined by a segmentation of the input space to regions, each with a single-parameter expert that acts as a constant predictor with zero-compute at inference. Motivated by high-rate quantization theory assumptions, we assume that the number of experts is sufficiently large to make their input-space regions very small. This lets us to study the approximation error of our MoE model class: (i) for one-dimensional inputs, we formulate the test error and its minimizing segmentation and experts; (ii) for multidimensional inputs, we formulate an upper bound for the test error and study its minimization. Moreover, we consider the learning of the expert parameters from a training dataset, given an input-space segmentation, and formulate their statistical learning properties. This leads us to theoretically and empirically show how the tradeoff between approximation and estimation errors in MoE learning depends on the number of experts.
title	Mixture of Many Zero-Compute Experts: A High-Rate Quantization Theory Perspective
topic	Machine Learning
url	https://arxiv.org/abs/2510.03151

Similar Items