Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ruggieri, Andrea, Stranieri, Francesco, Stella, Fabio, Scutari, Marco
Format:	Preprint
Published:	2020
Subjects:	Machine Learning Methodology
Online Access:	https://arxiv.org/abs/2012.05269
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915096260771840
author	Ruggieri, Andrea Stranieri, Francesco Stella, Fabio Scutari, Marco
author_facet	Ruggieri, Andrea Stranieri, Francesco Stella, Fabio Scutari, Marco
contents	Incomplete data are a common feature in many domains, from clinical trials to industrial applications. Bayesian networks (BNs) are often used in these domains because of their graphical and causal interpretations. BN parameter learning from incomplete data is usually implemented with the Expectation-Maximisation algorithm (EM), which computes the relevant sufficient statistics ("soft EM") using belief propagation. Similarly, the Structural Expectation-Maximisation algorithm (Structural EM) learns the network structure of the BN from those sufficient statistics using algorithms designed for complete data. However, practical implementations of parameter and structure learning often impute missing data ("hard EM") to compute sufficient statistics instead of using belief propagation, for both ease of implementation and computational speed. In this paper, we investigate the question: what is the impact of using imputation instead of belief propagation on the quality of the resulting BNs? From a simulation study using synthetic data and reference BNs, we find that it is possible to recommend one approach over the other in several scenarios based on the characteristics of the data. We then use this information to build a simple decision tree to guide practitioners in choosing the EM algorithm best suited to their problem.
format	Preprint
id	arxiv_https___arxiv_org_abs_2012_05269
institution	arXiv
publishDate	2020
record_format	arxiv
spellingShingle	Hard and Soft EM in Bayesian Network Learning from Incomplete Data Ruggieri, Andrea Stranieri, Francesco Stella, Fabio Scutari, Marco Machine Learning Methodology Incomplete data are a common feature in many domains, from clinical trials to industrial applications. Bayesian networks (BNs) are often used in these domains because of their graphical and causal interpretations. BN parameter learning from incomplete data is usually implemented with the Expectation-Maximisation algorithm (EM), which computes the relevant sufficient statistics ("soft EM") using belief propagation. Similarly, the Structural Expectation-Maximisation algorithm (Structural EM) learns the network structure of the BN from those sufficient statistics using algorithms designed for complete data. However, practical implementations of parameter and structure learning often impute missing data ("hard EM") to compute sufficient statistics instead of using belief propagation, for both ease of implementation and computational speed. In this paper, we investigate the question: what is the impact of using imputation instead of belief propagation on the quality of the resulting BNs? From a simulation study using synthetic data and reference BNs, we find that it is possible to recommend one approach over the other in several scenarios based on the characteristics of the data. We then use this information to build a simple decision tree to guide practitioners in choosing the EM algorithm best suited to their problem.
title	Hard and Soft EM in Bayesian Network Learning from Incomplete Data
topic	Machine Learning Methodology
url	https://arxiv.org/abs/2012.05269

Similar Items