Saved in:
Bibliographic Details
Main Authors: Xu, Sascha, Walter, Nils Philipp, Kalofolias, Janis, Vreeken, Jilles
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.12930
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909112496816128
author Xu, Sascha
Walter, Nils Philipp
Kalofolias, Janis
Vreeken, Jilles
author_facet Xu, Sascha
Walter, Nils Philipp
Kalofolias, Janis
Vreeken, Jilles
contents Finding and describing sub-populations that are exceptional regarding a target property has important applications in many scientific disciplines, from identifying disadvantaged demographic groups in census data to finding conductive molecules within gold nanoparticles. Current approaches to finding such subgroups require pre-discretized predictive variables, do not permit non-trivial target distributions, do not scale to large datasets, and struggle to find diverse results. To address these limitations, we propose Syflow, an end-to-end optimizable approach in which we leverage normalizing flows to model arbitrary target distributions, and introduce a novel neural layer that results in easily interpretable subgroup descriptions. We demonstrate on synthetic and real-world data, including a case study, that Syflow reliably finds highly exceptional subgroups accompanied by insightful descriptions.
format Preprint
id arxiv_https___arxiv_org_abs_2402_12930
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Learning Exceptional Subgroups by End-to-End Maximizing KL-divergence
Xu, Sascha
Walter, Nils Philipp
Kalofolias, Janis
Vreeken, Jilles
Machine Learning
Finding and describing sub-populations that are exceptional regarding a target property has important applications in many scientific disciplines, from identifying disadvantaged demographic groups in census data to finding conductive molecules within gold nanoparticles. Current approaches to finding such subgroups require pre-discretized predictive variables, do not permit non-trivial target distributions, do not scale to large datasets, and struggle to find diverse results. To address these limitations, we propose Syflow, an end-to-end optimizable approach in which we leverage normalizing flows to model arbitrary target distributions, and introduce a novel neural layer that results in easily interpretable subgroup descriptions. We demonstrate on synthetic and real-world data, including a case study, that Syflow reliably finds highly exceptional subgroups accompanied by insightful descriptions.
title Learning Exceptional Subgroups by End-to-End Maximizing KL-divergence
topic Machine Learning
url https://arxiv.org/abs/2402.12930