Saved in:
Bibliografiske detaljer
Main Authors: Nathan, R. P., Nikolaou, Nikolaos, Lahav, Ofer
Format: Preprint
Udgivet: 2025
Fag:
Online adgang:https://arxiv.org/abs/2502.04310
Tags: Tilføj Tag
Ingen Tags, Vær først til at tagge denne postø!
_version_ 1866912223136317440
author Nathan, R. P.
Nikolaou, Nikolaos
Lahav, Ofer
author_facet Nathan, R. P.
Nikolaou, Nikolaos
Lahav, Ofer
contents Unsupervised machine learning methods are well suited to searching for anomalies at scale but can struggle with the high-dimensional representation of many modern datasets, hence dimensionality reduction (DR) is often performed first. In this paper we analyse unsupervised anomaly detection (AD) from the perspective of the manifold created in DR. We present an idealised illustration, "Finding Pegasus", and a novel formal framework with which we categorise AD methods and their results into "on manifold" and "off manifold". We define these terms and show how they differ. We then use this insight to develop an approach of combining AD methods which significantly boosts AD recall without sacrificing precision in situations employing high DR. When tested on MNIST data, our approach of combining AD methods improves recall by as much as 16 percent compared with simply combining with the best standalone AD method (Isolation Forest), a result which shows great promise for its application to real-world data.
format Preprint
id arxiv_https___arxiv_org_abs_2502_04310
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Finding Pegasus: Enhancing Unsupervised Anomaly Detection in High-Dimensional Data using a Manifold-Based Approach
Nathan, R. P.
Nikolaou, Nikolaos
Lahav, Ofer
Machine Learning
Cosmology and Nongalactic Astrophysics
Unsupervised machine learning methods are well suited to searching for anomalies at scale but can struggle with the high-dimensional representation of many modern datasets, hence dimensionality reduction (DR) is often performed first. In this paper we analyse unsupervised anomaly detection (AD) from the perspective of the manifold created in DR. We present an idealised illustration, "Finding Pegasus", and a novel formal framework with which we categorise AD methods and their results into "on manifold" and "off manifold". We define these terms and show how they differ. We then use this insight to develop an approach of combining AD methods which significantly boosts AD recall without sacrificing precision in situations employing high DR. When tested on MNIST data, our approach of combining AD methods improves recall by as much as 16 percent compared with simply combining with the best standalone AD method (Isolation Forest), a result which shows great promise for its application to real-world data.
title Finding Pegasus: Enhancing Unsupervised Anomaly Detection in High-Dimensional Data using a Manifold-Based Approach
topic Machine Learning
Cosmology and Nongalactic Astrophysics
url https://arxiv.org/abs/2502.04310