Saved in:
Bibliographic Details
Main Authors: Kim, Jang-Hyun, Gibbs, Claudia Skok, Yun, Sangdoo, Song, Hyun Oh, Cho, Kyunghyun
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.16218
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916977596956672
author Kim, Jang-Hyun
Gibbs, Claudia Skok
Yun, Sangdoo
Song, Hyun Oh
Cho, Kyunghyun
author_facet Kim, Jang-Hyun
Gibbs, Claudia Skok
Yun, Sangdoo
Song, Hyun Oh
Cho, Kyunghyun
contents We propose a novel machine learning approach for inferring causal variables of a target variable from observations. Our focus is on directly inferring a set of causal factors without requiring full causal graph reconstruction, which is computationally challenging in large-scale systems. The identified causal set consists of all potential regulators of the target variable under experimental settings, enabling efficient regulation through intervention. To achieve this, we train a neural network using supervised learning on simulated data to infer causality. By employing a subsampled-ensemble inference strategy, our approach scales with linear complexity in the number of variables, efficiently scaling up to thousands of variables. Empirical results demonstrate superior performance in identifying causal relationships within large-scale gene regulatory networks, outperforming existing methods that emphasize full-graph discovery. We validate our model's generalization capability across out-of-distribution graph structures and generating mechanisms, including gene regulatory networks of E. coli and the human K562 cell line. Implementation codes are available at https://github.com/snu-mllab/Targeted-Cause-Discovery.
format Preprint
id arxiv_https___arxiv_org_abs_2408_16218
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Large-Scale Targeted Cause Discovery via Learning from Simulated Data
Kim, Jang-Hyun
Gibbs, Claudia Skok
Yun, Sangdoo
Song, Hyun Oh
Cho, Kyunghyun
Machine Learning
We propose a novel machine learning approach for inferring causal variables of a target variable from observations. Our focus is on directly inferring a set of causal factors without requiring full causal graph reconstruction, which is computationally challenging in large-scale systems. The identified causal set consists of all potential regulators of the target variable under experimental settings, enabling efficient regulation through intervention. To achieve this, we train a neural network using supervised learning on simulated data to infer causality. By employing a subsampled-ensemble inference strategy, our approach scales with linear complexity in the number of variables, efficiently scaling up to thousands of variables. Empirical results demonstrate superior performance in identifying causal relationships within large-scale gene regulatory networks, outperforming existing methods that emphasize full-graph discovery. We validate our model's generalization capability across out-of-distribution graph structures and generating mechanisms, including gene regulatory networks of E. coli and the human K562 cell line. Implementation codes are available at https://github.com/snu-mllab/Targeted-Cause-Discovery.
title Large-Scale Targeted Cause Discovery via Learning from Simulated Data
topic Machine Learning
url https://arxiv.org/abs/2408.16218