Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Huang, Yihong, Zhang, Yuang, Wang, Liping, Zhang, Fan, Lin, Xuemin
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2405.12502
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911937284014080
author	Huang, Yihong Zhang, Yuang Wang, Liping Zhang, Fan Lin, Xuemin
author_facet	Huang, Yihong Zhang, Yuang Wang, Liping Zhang, Fan Lin, Xuemin
contents	Unsupervised Outlier Detection (UOD) is an important data mining task. With the advance of deep learning, deep Outlier Detection (OD) has received broad interest. Most deep UOD models are trained exclusively on clean datasets to learn the distribution of the normal data, which requires huge manual efforts to clean the real-world data if possible. Instead of relying on clean datasets, some approaches directly train and detect on unlabeled contaminated datasets, leading to the need for methods that are robust to such conditions. Ensemble methods emerged as a superior solution to enhance model robustness against contaminated training sets. However, the training time is greatly increased by the ensemble. In this study, we investigate the impact of outliers on the training phase, aiming to halt training on unlabeled contaminated datasets before performance degradation. Initially, we noted that blending normal and anomalous data causes AUC fluctuations, a label-dependent measure of detection accuracy. To circumvent the need for labels, we propose a zero-label entropy metric named Loss Entropy for loss distribution, enabling us to infer optimal stopping points for training without labels. Meanwhile, we theoretically demonstrate negative correlation between entropy metric and the label-based AUC. Based on this, we develop an automated early-stopping algorithm, EntropyStop, which halts training when loss entropy suggests the maximum model detection capability. We conduct extensive experiments on ADBench (including 47 real datasets), and the overall results indicate that AutoEncoder (AE) enhanced by our approach not only achieves better performance than ensemble AEs but also requires under 2\% of training time. Lastly, our proposed metric and early-stopping approach are evaluated on other deep OD models, exhibiting their broad potential applicability.
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_12502
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy Huang, Yihong Zhang, Yuang Wang, Liping Zhang, Fan Lin, Xuemin Machine Learning Artificial Intelligence Unsupervised Outlier Detection (UOD) is an important data mining task. With the advance of deep learning, deep Outlier Detection (OD) has received broad interest. Most deep UOD models are trained exclusively on clean datasets to learn the distribution of the normal data, which requires huge manual efforts to clean the real-world data if possible. Instead of relying on clean datasets, some approaches directly train and detect on unlabeled contaminated datasets, leading to the need for methods that are robust to such conditions. Ensemble methods emerged as a superior solution to enhance model robustness against contaminated training sets. However, the training time is greatly increased by the ensemble. In this study, we investigate the impact of outliers on the training phase, aiming to halt training on unlabeled contaminated datasets before performance degradation. Initially, we noted that blending normal and anomalous data causes AUC fluctuations, a label-dependent measure of detection accuracy. To circumvent the need for labels, we propose a zero-label entropy metric named Loss Entropy for loss distribution, enabling us to infer optimal stopping points for training without labels. Meanwhile, we theoretically demonstrate negative correlation between entropy metric and the label-based AUC. Based on this, we develop an automated early-stopping algorithm, EntropyStop, which halts training when loss entropy suggests the maximum model detection capability. We conduct extensive experiments on ADBench (including 47 real datasets), and the overall results indicate that AutoEncoder (AE) enhanced by our approach not only achieves better performance than ensemble AEs but also requires under 2\% of training time. Lastly, our proposed metric and early-stopping approach are evaluated on other deep OD models, exhibiting their broad potential applicability.
title	EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2405.12502

Similar Items