Saved in:
Bibliographic Details
Main Authors: Massi, Michela C., Franco, Nicola R., Ieva, Francesca, Manzoni, Andrea, Paganoni, Anna Maria, Zunino, Paolo
Format: Preprint
Published: 2021
Subjects:
Online Access:https://arxiv.org/abs/2102.12974
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917664500219904
author Massi, Michela C.
Franco, Nicola R.
Ieva, Francesca
Manzoni, Andrea
Paganoni, Anna Maria
Zunino, Paolo
author_facet Massi, Michela C.
Franco, Nicola R.
Ieva, Francesca
Manzoni, Andrea
Paganoni, Anna Maria
Zunino, Paolo
contents Logistic Regression (LR) is a widely used statistical method in empirical binary classification studies. However, real-life scenarios oftentimes share complexities that prevent from the use of the as-is LR model, and instead highlight the need to include high-order interactions to capture data variability. This becomes even more challenging because of: (i) datasets growing wider, with more and more variables; (ii) studies being typically conducted in strongly imbalanced settings; (iii) samples going from very large to extremely small; (iv) the need of providing both predictive models and interpretable results. In this paper we present a novel algorithm, Learning high-order Interactions via targeted Pattern Search (LIPS), to select interaction terms of varying order to include in a LR model for an imbalanced binary classification task when input data are categorical. LIPS's rationale stems from the duality between item sets and categorical interactions. The algorithm relies on an interaction learning step based on a well-known frequent item set mining algorithm, and a novel dissimilarity-based interaction selection step that allows the user to specify the number of interactions to be included in the LR model. In addition, we particularize two variants (Scores LIPS and Clusters LIPS), that can address even more specific needs. Through a set of experiments we validate our algorithm and prove its wide applicability to real-life research scenarios, showing that it outperforms a benchmark state-of-the-art algorithm.
format Preprint
id arxiv_https___arxiv_org_abs_2102_12974
institution arXiv
publishDate 2021
record_format arxiv
spellingShingle Learning High-Order Interactions via Targeted Pattern Search
Massi, Michela C.
Franco, Nicola R.
Ieva, Francesca
Manzoni, Andrea
Paganoni, Anna Maria
Zunino, Paolo
Machine Learning
Logistic Regression (LR) is a widely used statistical method in empirical binary classification studies. However, real-life scenarios oftentimes share complexities that prevent from the use of the as-is LR model, and instead highlight the need to include high-order interactions to capture data variability. This becomes even more challenging because of: (i) datasets growing wider, with more and more variables; (ii) studies being typically conducted in strongly imbalanced settings; (iii) samples going from very large to extremely small; (iv) the need of providing both predictive models and interpretable results. In this paper we present a novel algorithm, Learning high-order Interactions via targeted Pattern Search (LIPS), to select interaction terms of varying order to include in a LR model for an imbalanced binary classification task when input data are categorical. LIPS's rationale stems from the duality between item sets and categorical interactions. The algorithm relies on an interaction learning step based on a well-known frequent item set mining algorithm, and a novel dissimilarity-based interaction selection step that allows the user to specify the number of interactions to be included in the LR model. In addition, we particularize two variants (Scores LIPS and Clusters LIPS), that can address even more specific needs. Through a set of experiments we validate our algorithm and prove its wide applicability to real-life research scenarios, showing that it outperforms a benchmark state-of-the-art algorithm.
title Learning High-Order Interactions via Targeted Pattern Search
topic Machine Learning
url https://arxiv.org/abs/2102.12974