Salvato in:
Dettagli Bibliografici
Autori principali: Kim, Ilmun, Wasserman, Larry, Balakrishnan, Sivaraman, Neykov, Matey
Natura: Preprint
Pubblicazione: 2024
Soggetti:
Accesso online:https://arxiv.org/abs/2402.18921
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866916154601111552
author Kim, Ilmun
Wasserman, Larry
Balakrishnan, Sivaraman
Neykov, Matey
author_facet Kim, Ilmun
Wasserman, Larry
Balakrishnan, Sivaraman
Neykov, Matey
contents Semi-supervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or time-consuming. The prevalence of such datasets has consistently driven the demand for new tools and methods that exploit the potential of unlabeled data. Responding to this demand, we introduce semi-supervised U-statistics enhanced by the abundance of unlabeled data, and investigate their statistical properties. We show that the proposed approach is asymptotically Normal and exhibits notable efficiency gains over classical U-statistics by effectively integrating various powerful prediction tools into the framework. To understand the fundamental difficulty of the problem, we derive minimax lower bounds in semi-supervised settings and showcase that our procedure is semi-parametrically efficient under regularity conditions. Moreover, tailored to bivariate kernels, we propose a refined approach that outperforms the classical U-statistic across all degeneracy regimes, and demonstrate its optimality properties. Simulation studies are conducted to corroborate our findings and to further demonstrate our framework.
format Preprint
id arxiv_https___arxiv_org_abs_2402_18921
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Semi-Supervised U-statistics
Kim, Ilmun
Wasserman, Larry
Balakrishnan, Sivaraman
Neykov, Matey
Statistics Theory
Methodology
Machine Learning
Semi-supervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or time-consuming. The prevalence of such datasets has consistently driven the demand for new tools and methods that exploit the potential of unlabeled data. Responding to this demand, we introduce semi-supervised U-statistics enhanced by the abundance of unlabeled data, and investigate their statistical properties. We show that the proposed approach is asymptotically Normal and exhibits notable efficiency gains over classical U-statistics by effectively integrating various powerful prediction tools into the framework. To understand the fundamental difficulty of the problem, we derive minimax lower bounds in semi-supervised settings and showcase that our procedure is semi-parametrically efficient under regularity conditions. Moreover, tailored to bivariate kernels, we propose a refined approach that outperforms the classical U-statistic across all degeneracy regimes, and demonstrate its optimality properties. Simulation studies are conducted to corroborate our findings and to further demonstrate our framework.
title Semi-Supervised U-statistics
topic Statistics Theory
Methodology
Machine Learning
url https://arxiv.org/abs/2402.18921