Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Cai, Kuntai, Xiao, Xiaokui, Yang, Yin
Format:	Preprint
Published:	2025
Subjects:	Databases
Online Access:	https://arxiv.org/abs/2503.22970
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913765863194624
author	Cai, Kuntai Xiao, Xiaokui Yang, Yin
author_facet	Cai, Kuntai Xiao, Xiaokui Yang, Yin
contents	Releasing relational databases while preserving privacy is an important research problem with numerous applications. A canonical approach is to generate synthetic data under differential privacy (DP), which provides a strong, rigorous privacy guarantee. The problem is particularly challenging when the data involve not only entities (e.g., represented by records in tables) but also relationships (represented by foreign-key references), since if we generate random records for each entity independently, the resulting synthetic data usually fail to exhibit realistic relationships. The current state of the art, PrivLava, addresses this issue by generating random join key attributes through a sophisticated expectation-maximization (EM) algorithm. This method, however, is rather costly in terms of privacy budget consumption, due to the numerous EM iterations needed to retain high data utility. Consequently, the privacy cost of PrivLava can be prohibitive for some real-world scenarios. We present a sophisticated PrivPetal approach that addresses the above issues via a novel concept: permutation relation, which is constructed as a surrogate to synthesize the flattened relation, avoiding the generation of a high-dimensional relation directly. The synthesis is done using a refined Markov random field mechanism, backed by fine-grained privacy analysis. Extensive experiments using multiple real datasets and the TPC-H benchmark demonstrate that PrivPetal significantly outperforms existing methods in terms of aggregate query accuracy on the synthetic data.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_22970
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	PrivPetal: Relational Data Synthesis via Permutation Relations Cai, Kuntai Xiao, Xiaokui Yang, Yin Databases Releasing relational databases while preserving privacy is an important research problem with numerous applications. A canonical approach is to generate synthetic data under differential privacy (DP), which provides a strong, rigorous privacy guarantee. The problem is particularly challenging when the data involve not only entities (e.g., represented by records in tables) but also relationships (represented by foreign-key references), since if we generate random records for each entity independently, the resulting synthetic data usually fail to exhibit realistic relationships. The current state of the art, PrivLava, addresses this issue by generating random join key attributes through a sophisticated expectation-maximization (EM) algorithm. This method, however, is rather costly in terms of privacy budget consumption, due to the numerous EM iterations needed to retain high data utility. Consequently, the privacy cost of PrivLava can be prohibitive for some real-world scenarios. We present a sophisticated PrivPetal approach that addresses the above issues via a novel concept: permutation relation, which is constructed as a surrogate to synthesize the flattened relation, avoiding the generation of a high-dimensional relation directly. The synthesis is done using a refined Markov random field mechanism, backed by fine-grained privacy analysis. Extensive experiments using multiple real datasets and the TPC-H benchmark demonstrate that PrivPetal significantly outperforms existing methods in terms of aggregate query accuracy on the synthetic data.
title	PrivPetal: Relational Data Synthesis via Permutation Relations
topic	Databases
url	https://arxiv.org/abs/2503.22970

Similar Items