Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kazdan, Joshua, Sun, Hao, Han, Jiaqi, Petersen, Felix, Ermon, Stefano
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2409.07025
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917772982747136
author	Kazdan, Joshua Sun, Hao Han, Jiaqi Petersen, Felix Ermon, Stefano
author_facet	Kazdan, Joshua Sun, Hao Han, Jiaqi Petersen, Felix Ermon, Stefano
contents	Diffusion models have a tendency to exactly replicate their training data, especially when trained on small datasets. Most prior work has sought to mitigate this problem by imposing differential privacy constraints or masking parts of the training data, resulting in a notable substantial decrease in image quality. We present CPSample, a method that modifies the sampling process to prevent training data replication while preserving image quality. CPSample utilizes a classifier that is trained to overfit on random binary labels attached to the training data. CPSample then uses classifier guidance to steer the generation process away from the set of points that can be classified with high certainty, a set that includes the training data. CPSample achieves FID scores of 4.97 and 2.97 on CIFAR-10 and CelebA-64, respectively, without producing exact replicates of the training data. Unlike prior methods intended to guard the training images, CPSample only requires training a classifier rather than retraining a diffusion model, which is computationally cheaper. Moreover, our technique provides diffusion models with greater robustness against membership inference attacks, wherein an adversary attempts to discern which images were in the model's training dataset. We show that CPSample behaves like a built-in rejection sampler, and we demonstrate its capabilities to prevent mode collapse in Stable Diffusion.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_07025
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	CPSample: Classifier Protected Sampling for Guarding Training Data During Diffusion Kazdan, Joshua Sun, Hao Han, Jiaqi Petersen, Felix Ermon, Stefano Machine Learning Diffusion models have a tendency to exactly replicate their training data, especially when trained on small datasets. Most prior work has sought to mitigate this problem by imposing differential privacy constraints or masking parts of the training data, resulting in a notable substantial decrease in image quality. We present CPSample, a method that modifies the sampling process to prevent training data replication while preserving image quality. CPSample utilizes a classifier that is trained to overfit on random binary labels attached to the training data. CPSample then uses classifier guidance to steer the generation process away from the set of points that can be classified with high certainty, a set that includes the training data. CPSample achieves FID scores of 4.97 and 2.97 on CIFAR-10 and CelebA-64, respectively, without producing exact replicates of the training data. Unlike prior methods intended to guard the training images, CPSample only requires training a classifier rather than retraining a diffusion model, which is computationally cheaper. Moreover, our technique provides diffusion models with greater robustness against membership inference attacks, wherein an adversary attempts to discern which images were in the model's training dataset. We show that CPSample behaves like a built-in rejection sampler, and we demonstrate its capabilities to prevent mode collapse in Stable Diffusion.
title	CPSample: Classifier Protected Sampling for Guarding Training Data During Diffusion
topic	Machine Learning
url	https://arxiv.org/abs/2409.07025

Similar Items