Saved in:
Bibliographic Details
Main Authors: Maddock, Samuel, Gade, Shripad, Cormode, Graham, Bullock, Will
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.09672
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911262195056640
author Maddock, Samuel
Gade, Shripad
Cormode, Graham
Bullock, Will
author_facet Maddock, Samuel
Gade, Shripad
Cormode, Graham
Bullock, Will
contents State-of-the-art differentially private synthetic tabular data has been defined by adaptive 'select-measure-generate' frameworks, exemplified by methods like AIM. These approaches iteratively measure low-order noisy marginals and fit graphical models to produce synthetic data, enabling systematic optimisation of data quality under privacy constraints. Graphical models, however, are inefficient for high-dimensional data because they require substantial memory and must be retrained from scratch whenever the graph structure changes, leading to significant computational overhead. Recent methods, like GEM, overcome these limitations by using generator neural networks for improved scalability. However, empirical comparisons have mostly focused on small datasets, limiting real-world applicability. In this work, we introduce GEM+, which integrates AIM's adaptive measurement framework with GEM's scalable generator network. Our experiments show that GEM+ outperforms AIM in both utility and scalability, delivering state-of-the-art results while efficiently handling datasets with over a hundred columns, where AIM fails due to memory and computational overheads.
format Preprint
id arxiv_https___arxiv_org_abs_2511_09672
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle GEM+: Scalable State-of-the-Art Private Synthetic Data with Generator Networks
Maddock, Samuel
Gade, Shripad
Cormode, Graham
Bullock, Will
Machine Learning
State-of-the-art differentially private synthetic tabular data has been defined by adaptive 'select-measure-generate' frameworks, exemplified by methods like AIM. These approaches iteratively measure low-order noisy marginals and fit graphical models to produce synthetic data, enabling systematic optimisation of data quality under privacy constraints. Graphical models, however, are inefficient for high-dimensional data because they require substantial memory and must be retrained from scratch whenever the graph structure changes, leading to significant computational overhead. Recent methods, like GEM, overcome these limitations by using generator neural networks for improved scalability. However, empirical comparisons have mostly focused on small datasets, limiting real-world applicability. In this work, we introduce GEM+, which integrates AIM's adaptive measurement framework with GEM's scalable generator network. Our experiments show that GEM+ outperforms AIM in both utility and scalability, delivering state-of-the-art results while efficiently handling datasets with over a hundred columns, where AIM fails due to memory and computational overheads.
title GEM+: Scalable State-of-the-Art Private Synthetic Data with Generator Networks
topic Machine Learning
url https://arxiv.org/abs/2511.09672