Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Chicheng, Zhou, Yihan
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2506.17607
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915354608926720
author	Zhang, Chicheng Zhou, Yihan
author_facet	Zhang, Chicheng Zhou, Yihan
contents	Multi-distribution learning extends agnostic Probably Approximately Correct (PAC) learning to the setting in which a family of $k$ distributions, $\{D_i\}_{i\in[k]}$, is considered and a classifier's performance is measured by its error under the worst distribution. This problem has attracted a lot of recent interests due to its applications in collaborative learning, fairness, and robustness. Despite a rather complete picture of sample complexity of passive multi-distribution learning, research on active multi-distribution learning remains scarce, with algorithms whose optimality remaining unknown. In this paper, we develop new algorithms for active multi-distribution learning and establish improved label complexity upper and lower bounds, in distribution-dependent and distribution-free settings. Specifically, in the near-realizable setting we prove an upper bound of $\widetilde{O}\Bigl(θ_{\max}(d+k)\ln\frac{1}{\varepsilon}\Bigr)$ and $\widetilde{O}\Bigl(θ_{\max}(d+k)\Bigl(\ln\frac{1}{\varepsilon}+\frac{ν^2}{\varepsilon^2}\Bigr)+\frac{kν}{\varepsilon^2}\Bigr)$ in the realizable and agnostic settings respectively, where $θ_{\max}$ is the maximum disagreement coefficient among the $k$ distributions, $d$ is the VC dimension of the hypothesis class, $ν$ is the multi-distribution error of the best hypothesis, and $\varepsilon$ is the target excess error. Moreover, we show that the bound in the realizable setting is information-theoretically optimal and that the $kν/\varepsilon^2$ term in the agnostic setting is fundamental for proper learners. We also establish instance-dependent sample complexity bound for passive multidistribution learning that smoothly interpolates between realizable and agnostic regimes~\citep{blum2017collaborative,zhang2024optimal}, which may be of independent interest.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_17607
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Towards Fundamental Limits for Active Multi-distribution Learning Zhang, Chicheng Zhou, Yihan Machine Learning Multi-distribution learning extends agnostic Probably Approximately Correct (PAC) learning to the setting in which a family of $k$ distributions, $\{D_i\}_{i\in[k]}$, is considered and a classifier's performance is measured by its error under the worst distribution. This problem has attracted a lot of recent interests due to its applications in collaborative learning, fairness, and robustness. Despite a rather complete picture of sample complexity of passive multi-distribution learning, research on active multi-distribution learning remains scarce, with algorithms whose optimality remaining unknown. In this paper, we develop new algorithms for active multi-distribution learning and establish improved label complexity upper and lower bounds, in distribution-dependent and distribution-free settings. Specifically, in the near-realizable setting we prove an upper bound of $\widetilde{O}\Bigl(θ_{\max}(d+k)\ln\frac{1}{\varepsilon}\Bigr)$ and $\widetilde{O}\Bigl(θ_{\max}(d+k)\Bigl(\ln\frac{1}{\varepsilon}+\frac{ν^2}{\varepsilon^2}\Bigr)+\frac{kν}{\varepsilon^2}\Bigr)$ in the realizable and agnostic settings respectively, where $θ_{\max}$ is the maximum disagreement coefficient among the $k$ distributions, $d$ is the VC dimension of the hypothesis class, $ν$ is the multi-distribution error of the best hypothesis, and $\varepsilon$ is the target excess error. Moreover, we show that the bound in the realizable setting is information-theoretically optimal and that the $kν/\varepsilon^2$ term in the agnostic setting is fundamental for proper learners. We also establish instance-dependent sample complexity bound for passive multidistribution learning that smoothly interpolates between realizable and agnostic regimes~\citep{blum2017collaborative,zhang2024optimal}, which may be of independent interest.
title	Towards Fundamental Limits for Active Multi-distribution Learning
topic	Machine Learning
url	https://arxiv.org/abs/2506.17607

Similar Items