Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Han, Yujin, Xu, Mingwenchan, Guan, Leying
Format:	Preprint
Veröffentlicht:	2023
Schlagworte:	Machine Learning
Online-Zugang:	https://arxiv.org/abs/2302.02237
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866910347855659008
author	Han, Yujin Xu, Mingwenchan Guan, Leying
author_facet	Han, Yujin Xu, Mingwenchan Guan, Leying
contents	The Random Forests classifier, a widely utilized off-the-shelf classification tool, assumes training and test samples come from the same distribution as other standard classifiers. However, in safety-critical scenarios like medical diagnosis and network attack detection, discrepancies between the training and test sets, including the potential presence of novel outlier samples not appearing during training, can pose significant challenges. To address this problem, we introduce the Conformalized Semi-Supervised Random Forest (CSForest), which couples the conformalization technique Jackknife+aB with semi-supervised tree ensembles to construct a set-valued prediction $C(x)$. Instead of optimizing over the training distribution, CSForest employs unlabeled test samples to enhance accuracy and flag unseen outliers by generating an empty set. Theoretically, we establish CSForest to cover true labels for previously observed inlier classes under arbitrarily label-shift in the test data. We compare CSForest with state-of-the-art methods using synthetic examples and various real-world datasets, under different types of distribution changes in the test domain. Our results highlight CSForest's effective prediction of inliers and its ability to detect outlier samples unique to the test data. In addition, CSForest shows persistently good performance as the sizes of the training and test sets vary. Codes of CSForest are available at https://github.com/yujinhan98/CSForest.
format	Preprint
id	arxiv_https___arxiv_org_abs_2302_02237
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Conformalized Semi-supervised Random Forest for Classification and Abnormality Detection Han, Yujin Xu, Mingwenchan Guan, Leying Machine Learning The Random Forests classifier, a widely utilized off-the-shelf classification tool, assumes training and test samples come from the same distribution as other standard classifiers. However, in safety-critical scenarios like medical diagnosis and network attack detection, discrepancies between the training and test sets, including the potential presence of novel outlier samples not appearing during training, can pose significant challenges. To address this problem, we introduce the Conformalized Semi-Supervised Random Forest (CSForest), which couples the conformalization technique Jackknife+aB with semi-supervised tree ensembles to construct a set-valued prediction $C(x)$. Instead of optimizing over the training distribution, CSForest employs unlabeled test samples to enhance accuracy and flag unseen outliers by generating an empty set. Theoretically, we establish CSForest to cover true labels for previously observed inlier classes under arbitrarily label-shift in the test data. We compare CSForest with state-of-the-art methods using synthetic examples and various real-world datasets, under different types of distribution changes in the test domain. Our results highlight CSForest's effective prediction of inliers and its ability to detect outlier samples unique to the test data. In addition, CSForest shows persistently good performance as the sizes of the training and test sets vary. Codes of CSForest are available at https://github.com/yujinhan98/CSForest.
title	Conformalized Semi-supervised Random Forest for Classification and Abnormality Detection
topic	Machine Learning
url	https://arxiv.org/abs/2302.02237

Ähnliche Einträge