Guardado en:
Detalles Bibliográficos
Autores principales: Ezzeddine, Fatima, Zammar, Osama, Giordano, Silvia, Ayoub, Omran
Formato: Preprint
Publicado: 2026
Materias:
Acceso en línea:https://arxiv.org/abs/2602.03611
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866915771474509824
author Ezzeddine, Fatima
Zammar, Osama
Giordano, Silvia
Ayoub, Omran
author_facet Ezzeddine, Fatima
Zammar, Osama
Giordano, Silvia
Ayoub, Omran
contents Counterfactual explanations (CFs) are increasingly integrated into Machine Learning as a Service (MLaaS) systems to improve transparency; however, ML models deployed via APIs are already vulnerable to privacy attacks such as membership inference and model extraction, and the impact of explanations on this threat landscape remains insufficiently understood. In this work, we focus on the problem of how CFs expand the attack surface of MLaaS by strengthening membership inference attacks (MIAs), and on the need to design defense mechanisms that mitigate this emerging risk without undermining utility and explainability. First, we systematically analyze how exposing CFs through query-based APIs enables more effective shadow-based MIAs. Second, we propose a defense framework that integrates Differential Privacy (DP) with Active Learning (AL) to jointly reduce memorization and limit effective training data exposure. Finally, we conduct an extensive empirical evaluation to characterize the three-way trade-off between privacy leakage, predictive performance, and explanation quality. Our findings highlight the need to carefully balance transparency, utility, and privacy in the responsible deployment of explainable MLaaS systems.
format Preprint
id arxiv_https___arxiv_org_abs_2602_03611
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Explanations Leak: Membership Inference with Differential Privacy and Active Learning Defense
Ezzeddine, Fatima
Zammar, Osama
Giordano, Silvia
Ayoub, Omran
Machine Learning
Counterfactual explanations (CFs) are increasingly integrated into Machine Learning as a Service (MLaaS) systems to improve transparency; however, ML models deployed via APIs are already vulnerable to privacy attacks such as membership inference and model extraction, and the impact of explanations on this threat landscape remains insufficiently understood. In this work, we focus on the problem of how CFs expand the attack surface of MLaaS by strengthening membership inference attacks (MIAs), and on the need to design defense mechanisms that mitigate this emerging risk without undermining utility and explainability. First, we systematically analyze how exposing CFs through query-based APIs enables more effective shadow-based MIAs. Second, we propose a defense framework that integrates Differential Privacy (DP) with Active Learning (AL) to jointly reduce memorization and limit effective training data exposure. Finally, we conduct an extensive empirical evaluation to characterize the three-way trade-off between privacy leakage, predictive performance, and explanation quality. Our findings highlight the need to carefully balance transparency, utility, and privacy in the responsible deployment of explainable MLaaS systems.
title Explanations Leak: Membership Inference with Differential Privacy and Active Learning Defense
topic Machine Learning
url https://arxiv.org/abs/2602.03611