Saved in:
Bibliographic Details
Main Authors: Lautrup, Anton Danholt, Hyrup, Tobias, Zimek, Arthur, Schneider-Kamp, Peter
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2404.15821
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909414588416000
author Lautrup, Anton Danholt
Hyrup, Tobias
Zimek, Arthur
Schneider-Kamp, Peter
author_facet Lautrup, Anton Danholt
Hyrup, Tobias
Zimek, Arthur
Schneider-Kamp, Peter
contents With the growing demand for synthetic data to address contemporary issues in machine learning, such as data scarcity, data fairness, and data privacy, having robust tools for assessing the utility and potential privacy risks of such data becomes crucial. SynthEval, a novel open-source evaluation framework distinguishes itself from existing tools by treating categorical and numerical attributes with equal care, without assuming any special kind of preprocessing steps. This~makes it applicable to virtually any synthetic dataset of tabular records. Our tool leverages statistical and machine learning techniques to comprehensively evaluate synthetic data fidelity and privacy-preserving integrity. SynthEval integrates a wide selection of metrics that can be used independently or in highly customisable benchmark configurations, and can easily be extended with additional metrics. In this paper, we describe SynthEval and illustrate its versatility with examples. The framework facilitates better benchmarking and more consistent comparisons of model capabilities.
format Preprint
id arxiv_https___arxiv_org_abs_2404_15821
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data
Lautrup, Anton Danholt
Hyrup, Tobias
Zimek, Arthur
Schneider-Kamp, Peter
Machine Learning
Performance
With the growing demand for synthetic data to address contemporary issues in machine learning, such as data scarcity, data fairness, and data privacy, having robust tools for assessing the utility and potential privacy risks of such data becomes crucial. SynthEval, a novel open-source evaluation framework distinguishes itself from existing tools by treating categorical and numerical attributes with equal care, without assuming any special kind of preprocessing steps. This~makes it applicable to virtually any synthetic dataset of tabular records. Our tool leverages statistical and machine learning techniques to comprehensively evaluate synthetic data fidelity and privacy-preserving integrity. SynthEval integrates a wide selection of metrics that can be used independently or in highly customisable benchmark configurations, and can easily be extended with additional metrics. In this paper, we describe SynthEval and illustrate its versatility with examples. The framework facilitates better benchmarking and more consistent comparisons of model capabilities.
title SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data
topic Machine Learning
Performance
url https://arxiv.org/abs/2404.15821