MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Chen, Winston, Jiang, Yifan, Noble, William Stafford, Lu, Yang Young
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Machine Learning Applications
Accesso online:	https://arxiv.org/abs/2408.17016
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866912231790215168
author	Chen, Winston Jiang, Yifan Noble, William Stafford Lu, Yang Young
author_facet	Chen, Winston Jiang, Yifan Noble, William Stafford Lu, Yang Young
contents	Machine learning (ML) models are powerful tools for detecting complex patterns within data, yet their "black box" nature limits their interpretability, hindering their use in critical domains like healthcare and finance. To address this challenge, interpretable ML methods have been developed to explain how features influence model predictions. However, these methods often focus on univariate feature importance, overlooking the complex interactions between features that ML models are capable of capturing. Recognizing this limitation, recent efforts have aimed to extend these methods to discover feature interactions, but existing approaches struggle with robustness and error control, especially under data perturbations. In this study, we introduce Diamond, a novel method for trustworthy feature interaction discovery. Diamond uniquely integrates the model-X knockoffs framework to control the false discovery rate (FDR), ensuring that the proportion of falsely discovered interactions remains low. A key innovation in Diamond is its non-additivity distillation procedure, which refines existing interaction importance measures to distill non-additive interaction effects, ensuring that FDR control is maintained. This approach addresses the limitations of off-the-shelf interaction measures, which, when used naively, can lead to inaccurate discoveries. Diamond's applicability spans a wide range of ML models, including deep neural networks, transformer models, tree-based models, and factorization-based models. Our empirical evaluations on both simulated and real datasets across various biomedical studies demonstrate Diamond's utility in enabling more reliable data-driven scientific discoveries. This method represents a significant step forward in the deployment of ML models for scientific innovation and hypothesis generation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_17016
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Error-controlled non-additive interaction discovery in machine learning models Chen, Winston Jiang, Yifan Noble, William Stafford Lu, Yang Young Machine Learning Applications Machine learning (ML) models are powerful tools for detecting complex patterns within data, yet their "black box" nature limits their interpretability, hindering their use in critical domains like healthcare and finance. To address this challenge, interpretable ML methods have been developed to explain how features influence model predictions. However, these methods often focus on univariate feature importance, overlooking the complex interactions between features that ML models are capable of capturing. Recognizing this limitation, recent efforts have aimed to extend these methods to discover feature interactions, but existing approaches struggle with robustness and error control, especially under data perturbations. In this study, we introduce Diamond, a novel method for trustworthy feature interaction discovery. Diamond uniquely integrates the model-X knockoffs framework to control the false discovery rate (FDR), ensuring that the proportion of falsely discovered interactions remains low. A key innovation in Diamond is its non-additivity distillation procedure, which refines existing interaction importance measures to distill non-additive interaction effects, ensuring that FDR control is maintained. This approach addresses the limitations of off-the-shelf interaction measures, which, when used naively, can lead to inaccurate discoveries. Diamond's applicability spans a wide range of ML models, including deep neural networks, transformer models, tree-based models, and factorization-based models. Our empirical evaluations on both simulated and real datasets across various biomedical studies demonstrate Diamond's utility in enabling more reliable data-driven scientific discoveries. This method represents a significant step forward in the deployment of ML models for scientific innovation and hypothesis generation.
title	Error-controlled non-additive interaction discovery in machine learning models
topic	Machine Learning Applications
url	https://arxiv.org/abs/2408.17016

Documenti analoghi