Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.09672 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911262195056640 |
|---|---|
| author | Maddock, Samuel Gade, Shripad Cormode, Graham Bullock, Will |
| author_facet | Maddock, Samuel Gade, Shripad Cormode, Graham Bullock, Will |
| contents | State-of-the-art differentially private synthetic tabular data has been defined by adaptive 'select-measure-generate' frameworks, exemplified by methods like AIM. These approaches iteratively measure low-order noisy marginals and fit graphical models to produce synthetic data, enabling systematic optimisation of data quality under privacy constraints. Graphical models, however, are inefficient for high-dimensional data because they require substantial memory and must be retrained from scratch whenever the graph structure changes, leading to significant computational overhead. Recent methods, like GEM, overcome these limitations by using generator neural networks for improved scalability. However, empirical comparisons have mostly focused on small datasets, limiting real-world applicability. In this work, we introduce GEM+, which integrates AIM's adaptive measurement framework with GEM's scalable generator network. Our experiments show that GEM+ outperforms AIM in both utility and scalability, delivering state-of-the-art results while efficiently handling datasets with over a hundred columns, where AIM fails due to memory and computational overheads. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2511_09672 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | GEM+: Scalable State-of-the-Art Private Synthetic Data with Generator Networks Maddock, Samuel Gade, Shripad Cormode, Graham Bullock, Will Machine Learning State-of-the-art differentially private synthetic tabular data has been defined by adaptive 'select-measure-generate' frameworks, exemplified by methods like AIM. These approaches iteratively measure low-order noisy marginals and fit graphical models to produce synthetic data, enabling systematic optimisation of data quality under privacy constraints. Graphical models, however, are inefficient for high-dimensional data because they require substantial memory and must be retrained from scratch whenever the graph structure changes, leading to significant computational overhead. Recent methods, like GEM, overcome these limitations by using generator neural networks for improved scalability. However, empirical comparisons have mostly focused on small datasets, limiting real-world applicability. In this work, we introduce GEM+, which integrates AIM's adaptive measurement framework with GEM's scalable generator network. Our experiments show that GEM+ outperforms AIM in both utility and scalability, delivering state-of-the-art results while efficiently handling datasets with over a hundred columns, where AIM fails due to memory and computational overheads. |
| title | GEM+: Scalable State-of-the-Art Private Synthetic Data with Generator Networks |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2511.09672 |