Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kim, Ilmun, Wasserman, Larry, Balakrishnan, Sivaraman, Neykov, Matey
Format:	Preprint
Published:	2024
Subjects:	Statistics Theory Methodology Machine Learning
Online Access:	https://arxiv.org/abs/2402.18921
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Semi-supervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or time-consuming. The prevalence of such datasets has consistently driven the demand for new tools and methods that exploit the potential of unlabeled data. Responding to this demand, we introduce semi-supervised U-statistics enhanced by the abundance of unlabeled data, and investigate their statistical properties. We show that the proposed approach is asymptotically Normal and exhibits notable efficiency gains over classical U-statistics by effectively integrating various powerful prediction tools into the framework. To understand the fundamental difficulty of the problem, we derive minimax lower bounds in semi-supervised settings and showcase that our procedure is semi-parametrically efficient under regularity conditions. Moreover, tailored to bivariate kernels, we propose a refined approach that outperforms the classical U-statistic across all degeneracy regimes, and demonstrate its optimality properties. Simulation studies are conducted to corroborate our findings and to further demonstrate our framework.

Similar Items