Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Zhu, Yanfan, Lyngaas, Issac, Meena, Murali Gopalakrishnan, Koran, Mary Ellen I., Malin, Bradley, Moyer, Daniel, Bao, Shunxing, Kapadia, Anuj, Wang, Xiao, Landman, Bennett, Huo, Yuankai
Formato:	Preprint
Publicado:	2025
Materias:	Machine Learning Artificial Intelligence Distributed, Parallel, and Cluster Computing
Acceso en línea:	https://arxiv.org/abs/2501.06080
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866929670262358016
author	Zhu, Yanfan Lyngaas, Issac Meena, Murali Gopalakrishnan Koran, Mary Ellen I. Malin, Bradley Moyer, Daniel Bao, Shunxing Kapadia, Anuj Wang, Xiao Landman, Bennett Huo, Yuankai
author_facet	Zhu, Yanfan Lyngaas, Issac Meena, Murali Gopalakrishnan Koran, Mary Ellen I. Malin, Bradley Moyer, Daniel Bao, Shunxing Kapadia, Anuj Wang, Xiao Landman, Bennett Huo, Yuankai
contents	Recent advancements in AI models are structured to retain user interactions, which could inadvertently include sensitive healthcare data. In the healthcare field, particularly when radiologists use AI-driven diagnostic tools hosted on online platforms, there is a risk that medical imaging data may be repurposed for future AI training without explicit consent, spotlighting critical privacy and intellectual property concerns around healthcare data usage. Addressing these privacy challenges, a novel approach known as Unlearnable Examples (UEs) has been introduced, aiming to make data unlearnable to deep learning models. A prominent method within this area, called Unlearnable Clustering (UC), has shown improved UE performance with larger batch sizes but was previously limited by computational resources. To push the boundaries of UE performance with theoretically unlimited resources, we scaled up UC learning across various datasets using Distributed Data Parallel (DDP) training on the Summit supercomputer. Our goal was to examine UE efficacy at high-performance computing (HPC) levels to prevent unauthorized learning and enhance data security, particularly exploring the impact of batch size on UE's unlearnability. Utilizing the robust computational capabilities of the Summit, extensive experiments were conducted on diverse datasets such as Pets, MedMNist, Flowers, and Flowers102. Our findings reveal that both overly large and overly small batch sizes can lead to performance instability and affect accuracy. However, the relationship between batch size and unlearnability varied across datasets, highlighting the necessity for tailored batch size strategies to achieve optimal data protection. Our results underscore the critical role of selecting appropriate batch sizes based on the specific characteristics of each dataset to prevent learning and ensure data security in deep learning applications.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_06080
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Scale-up Unlearnable Examples Learning with High-Performance Computing Zhu, Yanfan Lyngaas, Issac Meena, Murali Gopalakrishnan Koran, Mary Ellen I. Malin, Bradley Moyer, Daniel Bao, Shunxing Kapadia, Anuj Wang, Xiao Landman, Bennett Huo, Yuankai Machine Learning Artificial Intelligence Distributed, Parallel, and Cluster Computing Recent advancements in AI models are structured to retain user interactions, which could inadvertently include sensitive healthcare data. In the healthcare field, particularly when radiologists use AI-driven diagnostic tools hosted on online platforms, there is a risk that medical imaging data may be repurposed for future AI training without explicit consent, spotlighting critical privacy and intellectual property concerns around healthcare data usage. Addressing these privacy challenges, a novel approach known as Unlearnable Examples (UEs) has been introduced, aiming to make data unlearnable to deep learning models. A prominent method within this area, called Unlearnable Clustering (UC), has shown improved UE performance with larger batch sizes but was previously limited by computational resources. To push the boundaries of UE performance with theoretically unlimited resources, we scaled up UC learning across various datasets using Distributed Data Parallel (DDP) training on the Summit supercomputer. Our goal was to examine UE efficacy at high-performance computing (HPC) levels to prevent unauthorized learning and enhance data security, particularly exploring the impact of batch size on UE's unlearnability. Utilizing the robust computational capabilities of the Summit, extensive experiments were conducted on diverse datasets such as Pets, MedMNist, Flowers, and Flowers102. Our findings reveal that both overly large and overly small batch sizes can lead to performance instability and affect accuracy. However, the relationship between batch size and unlearnability varied across datasets, highlighting the necessity for tailored batch size strategies to achieve optimal data protection. Our results underscore the critical role of selecting appropriate batch sizes based on the specific characteristics of each dataset to prevent learning and ensure data security in deep learning applications.
title	Scale-up Unlearnable Examples Learning with High-Performance Computing
topic	Machine Learning Artificial Intelligence Distributed, Parallel, and Cluster Computing
url	https://arxiv.org/abs/2501.06080

Ejemplares similares