Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Khan, Mohammad Wahiduzzaman, Chen, Sheng, Mironov, Ilya, Zhang, Leizhen, Noor, Rabib
Formato:	Preprint
Publicado:	2025
Materias:	Machine Learning Cryptography and Security
Acceso en línea:	https://arxiv.org/abs/2503.12801
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866912278379495424
author	Khan, Mohammad Wahiduzzaman Chen, Sheng Mironov, Ilya Zhang, Leizhen Noor, Rabib
author_facet	Khan, Mohammad Wahiduzzaman Chen, Sheng Mironov, Ilya Zhang, Leizhen Noor, Rabib
contents	Model memorization has implications for both the generalization capacity of machine learning models and the privacy of their training data. This paper investigates label memorization in binary classification models through two novel passive label inference attacks (BLIA). These attacks operate passively, relying solely on the outputs of pre-trained models, such as confidence scores and log-loss values, without interacting with or modifying the training process. By intentionally flipping 50% of the labels in controlled subsets, termed "canaries," we evaluate the extent of label memorization under two conditions: models trained without label differential privacy (Label-DP) and those trained with randomized response-based Label-DP. Despite the application of varying degrees of Label-DP, the proposed attacks consistently achieve success rates exceeding 50%, surpassing the baseline of random guessing and conclusively demonstrating that models memorize training labels, even when these labels are deliberately uncorrelated with the features.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_12801
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	BLIA: Detect model memorization in binary classification model through passive Label Inference attack Khan, Mohammad Wahiduzzaman Chen, Sheng Mironov, Ilya Zhang, Leizhen Noor, Rabib Machine Learning Cryptography and Security Model memorization has implications for both the generalization capacity of machine learning models and the privacy of their training data. This paper investigates label memorization in binary classification models through two novel passive label inference attacks (BLIA). These attacks operate passively, relying solely on the outputs of pre-trained models, such as confidence scores and log-loss values, without interacting with or modifying the training process. By intentionally flipping 50% of the labels in controlled subsets, termed "canaries," we evaluate the extent of label memorization under two conditions: models trained without label differential privacy (Label-DP) and those trained with randomized response-based Label-DP. Despite the application of varying degrees of Label-DP, the proposed attacks consistently achieve success rates exceeding 50%, surpassing the baseline of random guessing and conclusively demonstrating that models memorize training labels, even when these labels are deliberately uncorrelated with the features.
title	BLIA: Detect model memorization in binary classification model through passive Label Inference attack
topic	Machine Learning Cryptography and Security
url	https://arxiv.org/abs/2503.12801

Ejemplares similares