MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Kim, Ilmun, Wasserman, Larry, Balakrishnan, Sivaraman, Neykov, Matey
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Statistics Theory Methodology Machine Learning
Accesso online:	https://arxiv.org/abs/2402.18921
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866916154601111552
author	Kim, Ilmun Wasserman, Larry Balakrishnan, Sivaraman Neykov, Matey
author_facet	Kim, Ilmun Wasserman, Larry Balakrishnan, Sivaraman Neykov, Matey
contents	Semi-supervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or time-consuming. The prevalence of such datasets has consistently driven the demand for new tools and methods that exploit the potential of unlabeled data. Responding to this demand, we introduce semi-supervised U-statistics enhanced by the abundance of unlabeled data, and investigate their statistical properties. We show that the proposed approach is asymptotically Normal and exhibits notable efficiency gains over classical U-statistics by effectively integrating various powerful prediction tools into the framework. To understand the fundamental difficulty of the problem, we derive minimax lower bounds in semi-supervised settings and showcase that our procedure is semi-parametrically efficient under regularity conditions. Moreover, tailored to bivariate kernels, we propose a refined approach that outperforms the classical U-statistic across all degeneracy regimes, and demonstrate its optimality properties. Simulation studies are conducted to corroborate our findings and to further demonstrate our framework.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_18921
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Semi-Supervised U-statistics Kim, Ilmun Wasserman, Larry Balakrishnan, Sivaraman Neykov, Matey Statistics Theory Methodology Machine Learning Semi-supervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or time-consuming. The prevalence of such datasets has consistently driven the demand for new tools and methods that exploit the potential of unlabeled data. Responding to this demand, we introduce semi-supervised U-statistics enhanced by the abundance of unlabeled data, and investigate their statistical properties. We show that the proposed approach is asymptotically Normal and exhibits notable efficiency gains over classical U-statistics by effectively integrating various powerful prediction tools into the framework. To understand the fundamental difficulty of the problem, we derive minimax lower bounds in semi-supervised settings and showcase that our procedure is semi-parametrically efficient under regularity conditions. Moreover, tailored to bivariate kernels, we propose a refined approach that outperforms the classical U-statistic across all degeneracy regimes, and demonstrate its optimality properties. Simulation studies are conducted to corroborate our findings and to further demonstrate our framework.
title	Semi-Supervised U-statistics
topic	Statistics Theory Methodology Machine Learning
url	https://arxiv.org/abs/2402.18921

Documenti analoghi