Salvato in:
| Autori principali: | , , , |
|---|---|
| Natura: | Preprint |
| Pubblicazione: |
2024
|
| Soggetti: | |
| Accesso online: | https://arxiv.org/abs/2402.18921 |
| Tags: |
Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
| _version_ | 1866916154601111552 |
|---|---|
| author | Kim, Ilmun Wasserman, Larry Balakrishnan, Sivaraman Neykov, Matey |
| author_facet | Kim, Ilmun Wasserman, Larry Balakrishnan, Sivaraman Neykov, Matey |
| contents | Semi-supervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or time-consuming. The prevalence of such datasets has consistently driven the demand for new tools and methods that exploit the potential of unlabeled data. Responding to this demand, we introduce semi-supervised U-statistics enhanced by the abundance of unlabeled data, and investigate their statistical properties. We show that the proposed approach is asymptotically Normal and exhibits notable efficiency gains over classical U-statistics by effectively integrating various powerful prediction tools into the framework. To understand the fundamental difficulty of the problem, we derive minimax lower bounds in semi-supervised settings and showcase that our procedure is semi-parametrically efficient under regularity conditions. Moreover, tailored to bivariate kernels, we propose a refined approach that outperforms the classical U-statistic across all degeneracy regimes, and demonstrate its optimality properties. Simulation studies are conducted to corroborate our findings and to further demonstrate our framework. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2402_18921 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Semi-Supervised U-statistics Kim, Ilmun Wasserman, Larry Balakrishnan, Sivaraman Neykov, Matey Statistics Theory Methodology Machine Learning Semi-supervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or time-consuming. The prevalence of such datasets has consistently driven the demand for new tools and methods that exploit the potential of unlabeled data. Responding to this demand, we introduce semi-supervised U-statistics enhanced by the abundance of unlabeled data, and investigate their statistical properties. We show that the proposed approach is asymptotically Normal and exhibits notable efficiency gains over classical U-statistics by effectively integrating various powerful prediction tools into the framework. To understand the fundamental difficulty of the problem, we derive minimax lower bounds in semi-supervised settings and showcase that our procedure is semi-parametrically efficient under regularity conditions. Moreover, tailored to bivariate kernels, we propose a refined approach that outperforms the classical U-statistic across all degeneracy regimes, and demonstrate its optimality properties. Simulation studies are conducted to corroborate our findings and to further demonstrate our framework. |
| title | Semi-Supervised U-statistics |
| topic | Statistics Theory Methodology Machine Learning |
| url | https://arxiv.org/abs/2402.18921 |