Guardado en:
Detalles Bibliográficos
Autores principales: Koeken, Adil, Ziller, Alexander, Knolle, Moritz, Rueckert, Daniel
Formato: Preprint
Publicado: 2025
Materias:
Acceso en línea:https://arxiv.org/abs/2510.01793
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866909821082533888
author Koeken, Adil
Ziller, Alexander
Knolle, Moritz
Rueckert, Daniel
author_facet Koeken, Adil
Ziller, Alexander
Knolle, Moritz
Rueckert, Daniel
contents The generation of privacy-preserving synthetic datasets is a promising avenue for overcoming data scarcity in medical AI research. Post-hoc privacy filtering techniques, designed to remove samples containing personally identifiable information, have recently been proposed as a solution. However, their effectiveness remains largely unverified. This work presents a rigorous evaluation of a filtering pipeline applied to chest X-ray synthesis. Contrary to claims from the original publications, our results demonstrate that current filters exhibit limited specificity and consistency, achieving high sensitivity only for real images while failing to reliably detect near-duplicates generated from training data. These results demonstrate a critical limitation of post-hoc filtering: rather than effectively safeguarding patient privacy, these methods may provide a false sense of security while leaving unacceptable levels of patient information exposed. We conclude that substantial advances in filter design are needed before these methods can be confidently deployed in sensitive applications.
format Preprint
id arxiv_https___arxiv_org_abs_2510_01793
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Sensitivity, Specificity, and Consistency: A Tripartite Evaluation of Privacy Filters for Synthetic Data Generation
Koeken, Adil
Ziller, Alexander
Knolle, Moritz
Rueckert, Daniel
Machine Learning
The generation of privacy-preserving synthetic datasets is a promising avenue for overcoming data scarcity in medical AI research. Post-hoc privacy filtering techniques, designed to remove samples containing personally identifiable information, have recently been proposed as a solution. However, their effectiveness remains largely unverified. This work presents a rigorous evaluation of a filtering pipeline applied to chest X-ray synthesis. Contrary to claims from the original publications, our results demonstrate that current filters exhibit limited specificity and consistency, achieving high sensitivity only for real images while failing to reliably detect near-duplicates generated from training data. These results demonstrate a critical limitation of post-hoc filtering: rather than effectively safeguarding patient privacy, these methods may provide a false sense of security while leaving unacceptable levels of patient information exposed. We conclude that substantial advances in filter design are needed before these methods can be confidently deployed in sensitive applications.
title Sensitivity, Specificity, and Consistency: A Tripartite Evaluation of Privacy Filters for Synthetic Data Generation
topic Machine Learning
url https://arxiv.org/abs/2510.01793