Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Ganev, Georgi, Nazari, Reza, Davison, Rees, Dizche, Amir, Wu, Xinmin, Abbey, Ralph, Silva, Jorge, De Cristofaro, Emiliano
Formato:	Preprint
Publicado:	2025
Materias:	Cryptography and Security
Acceso en línea:	https://arxiv.org/abs/2510.15083
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866910037464580096
author	Ganev, Georgi Nazari, Reza Davison, Rees Dizche, Amir Wu, Xinmin Abbey, Ralph Silva, Jorge De Cristofaro, Emiliano
author_facet	Ganev, Georgi Nazari, Reza Davison, Rees Dizche, Amir Wu, Xinmin Abbey, Ralph Silva, Jorge De Cristofaro, Emiliano
contents	The Synthetic Minority Over-sampling Technique (SMOTE) is one of the most widely used methods for addressing class imbalance and generating synthetic data. Despite its popularity, little attention has been paid to its privacy implications; yet, it is used in the wild in many privacy-sensitive applications. In this work, we conduct the first systematic study of privacy leakage in SMOTE: we begin by showing that prevailing evaluation practices, i.e., naive distinguishing and distance-to-closest-record metrics, completely fail to detect any leakage and that membership inference attacks (MIAs) can be instantiated with high accuracy. Then, by exploiting SMOTE's geometric properties, we build two novel attacks with very limited assumptions: DistinSMOTE, which perfectly distinguishes real from synthetic records in augmented datasets, and ReconSMOTE, which reconstructs real minority records from synthetic datasets with perfect precision and recall approaching one under realistic imbalance ratios. We also provide theoretical guarantees for both attacks. Experiments on eight standard imbalanced datasets confirm the practicality and effectiveness of these attacks. Overall, our work reveals that SMOTE is inherently non-private and disproportionately exposes minority records, highlighting the need to reconsider its use in privacy-sensitive applications and as a baseline for assessing the privacy of modern generative models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_15083
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	SMOTE and Mirrors: Exposing Privacy Leakage from Synthetic Minority Oversampling Ganev, Georgi Nazari, Reza Davison, Rees Dizche, Amir Wu, Xinmin Abbey, Ralph Silva, Jorge De Cristofaro, Emiliano Cryptography and Security The Synthetic Minority Over-sampling Technique (SMOTE) is one of the most widely used methods for addressing class imbalance and generating synthetic data. Despite its popularity, little attention has been paid to its privacy implications; yet, it is used in the wild in many privacy-sensitive applications. In this work, we conduct the first systematic study of privacy leakage in SMOTE: we begin by showing that prevailing evaluation practices, i.e., naive distinguishing and distance-to-closest-record metrics, completely fail to detect any leakage and that membership inference attacks (MIAs) can be instantiated with high accuracy. Then, by exploiting SMOTE's geometric properties, we build two novel attacks with very limited assumptions: DistinSMOTE, which perfectly distinguishes real from synthetic records in augmented datasets, and ReconSMOTE, which reconstructs real minority records from synthetic datasets with perfect precision and recall approaching one under realistic imbalance ratios. We also provide theoretical guarantees for both attacks. Experiments on eight standard imbalanced datasets confirm the practicality and effectiveness of these attacks. Overall, our work reveals that SMOTE is inherently non-private and disproportionately exposes minority records, highlighting the need to reconsider its use in privacy-sensitive applications and as a baseline for assessing the privacy of modern generative models.
title	SMOTE and Mirrors: Exposing Privacy Leakage from Synthetic Minority Oversampling
topic	Cryptography and Security
url	https://arxiv.org/abs/2510.15083

Ejemplares similares