MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Sidorenko, Andrey, Platzer, Michael, Scriminaci, Mario, Tiwald, Paul
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Machine Learning Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2504.01908
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866913773192740864
author	Sidorenko, Andrey Platzer, Michael Scriminaci, Mario Tiwald, Paul
author_facet	Sidorenko, Andrey Platzer, Michael Scriminaci, Mario Tiwald, Paul
contents	Evaluating the quality of synthetic data remains a key challenge for ensuring privacy and utility in data-driven research. In this work, we present an evaluation framework that quantifies how well synthetic data replicates original distributional properties while ensuring privacy. The proposed approach employs a holdout-based benchmarking strategy that facilitates quantitative assessment through low- and high-dimensional distribution comparisons, embedding-based similarity measures, and nearest-neighbor distance metrics. The framework supports various data types and structures, including sequential and contextual information, and enables interpretable quality diagnostics through a set of standardized metrics. These contributions aim to support reproducibility and methodological consistency in benchmarking of synthetic data generation techniques. The code of the framework is available at https://github.com/mostly-ai/mostlyai-qa.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_01908
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation Framework Sidorenko, Andrey Platzer, Michael Scriminaci, Mario Tiwald, Paul Machine Learning Artificial Intelligence Evaluating the quality of synthetic data remains a key challenge for ensuring privacy and utility in data-driven research. In this work, we present an evaluation framework that quantifies how well synthetic data replicates original distributional properties while ensuring privacy. The proposed approach employs a holdout-based benchmarking strategy that facilitates quantitative assessment through low- and high-dimensional distribution comparisons, embedding-based similarity measures, and nearest-neighbor distance metrics. The framework supports various data types and structures, including sequential and contextual information, and enables interpretable quality diagnostics through a set of standardized metrics. These contributions aim to support reproducibility and methodological consistency in benchmarking of synthetic data generation techniques. The code of the framework is available at https://github.com/mostly-ai/mostlyai-qa.
title	Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation Framework
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2504.01908

Documenti analoghi