Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Bartell, Jennifer A, Valentin, Sander Boisen, Krogh, Anders, Langberg, Henning, Bøgsted, Martin
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2401.17653
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911941528649728
author	Bartell, Jennifer A Valentin, Sander Boisen Krogh, Anders Langberg, Henning Bøgsted, Martin
author_facet	Bartell, Jennifer A Valentin, Sander Boisen Krogh, Anders Langberg, Henning Bøgsted, Martin
contents	Recent advances in deep generative models have greatly expanded the potential to create realistic synthetic health datasets. These synthetic datasets aim to preserve the characteristics, patterns, and overall scientific conclusions derived from sensitive health datasets without disclosing patient identity or sensitive information. Thus, synthetic data can facilitate safe data sharing that supports a range of initiatives including the development of new predictive models, advanced health IT platforms, and general project ideation and hypothesis development. However, many questions and challenges remain, including how to consistently evaluate a synthetic dataset's similarity and predictive utility in comparison to the original real dataset and risk to privacy when shared. Additional regulatory and governance issues have not been widely addressed. In this primer, we map the state of synthetic health data, including generation and evaluation methods and tools, existing examples of deployment, the regulatory and ethical landscape, access and governance options, and opportunities for further development.
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_17653
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	A primer on synthetic health data Bartell, Jennifer A Valentin, Sander Boisen Krogh, Anders Langberg, Henning Bøgsted, Martin Machine Learning Recent advances in deep generative models have greatly expanded the potential to create realistic synthetic health datasets. These synthetic datasets aim to preserve the characteristics, patterns, and overall scientific conclusions derived from sensitive health datasets without disclosing patient identity or sensitive information. Thus, synthetic data can facilitate safe data sharing that supports a range of initiatives including the development of new predictive models, advanced health IT platforms, and general project ideation and hypothesis development. However, many questions and challenges remain, including how to consistently evaluate a synthetic dataset's similarity and predictive utility in comparison to the original real dataset and risk to privacy when shared. Additional regulatory and governance issues have not been widely addressed. In this primer, we map the state of synthetic health data, including generation and evaluation methods and tools, existing examples of deployment, the regulatory and ethical landscape, access and governance options, and opportunities for further development.
title	A primer on synthetic health data
topic	Machine Learning
url	https://arxiv.org/abs/2401.17653

Similar Items