Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Abitbul, Koren, Dar, Yehuda
Format:	Preprint
Published:	2023
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2310.02897
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916284331982848
author	Abitbul, Koren Dar, Yehuda
author_facet	Abitbul, Koren Dar, Yehuda
contents	Overparameterized autoencoder models often memorize their training data. For image data, memorization is often examined by using the trained autoencoder to recover missing regions in its training images (that were used only in their complete forms in the training). In this paper, we propose an inverse problem perspective for the study of memorization. Given a degraded training image, we define the recovery of the original training image as an inverse problem and formulate it as an optimization task. In our inverse problem, we use the trained autoencoder to implicitly define a regularizer for the particular training dataset that we aim to retrieve from. We develop the intricate optimization task into a practical method that iteratively applies the trained autoencoder and relatively simple computations that estimate and address the unknown degradation operator. We evaluate our method for blind inpainting where the goal is to recover training images from degradation of many missing pixels in an unknown pattern. We examine various deep autoencoder architectures, such as fully connected and U-Net (with various nonlinearities and at diverse train loss values), and show that our method significantly outperforms previous memorization-evaluation methods that recover training data from autoencoders. Importantly, our method greatly improves the recovery performance also in settings that were previously considered highly challenging, and even impractical, for such recovery and memorization evaluation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2310_02897
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	How Much Training Data is Memorized in Overparameterized Autoencoders? An Inverse Problem Perspective on Memorization Evaluation Abitbul, Koren Dar, Yehuda Machine Learning Overparameterized autoencoder models often memorize their training data. For image data, memorization is often examined by using the trained autoencoder to recover missing regions in its training images (that were used only in their complete forms in the training). In this paper, we propose an inverse problem perspective for the study of memorization. Given a degraded training image, we define the recovery of the original training image as an inverse problem and formulate it as an optimization task. In our inverse problem, we use the trained autoencoder to implicitly define a regularizer for the particular training dataset that we aim to retrieve from. We develop the intricate optimization task into a practical method that iteratively applies the trained autoencoder and relatively simple computations that estimate and address the unknown degradation operator. We evaluate our method for blind inpainting where the goal is to recover training images from degradation of many missing pixels in an unknown pattern. We examine various deep autoencoder architectures, such as fully connected and U-Net (with various nonlinearities and at diverse train loss values), and show that our method significantly outperforms previous memorization-evaluation methods that recover training data from autoencoders. Importantly, our method greatly improves the recovery performance also in settings that were previously considered highly challenging, and even impractical, for such recovery and memorization evaluation.
title	How Much Training Data is Memorized in Overparameterized Autoencoders? An Inverse Problem Perspective on Memorization Evaluation
topic	Machine Learning
url	https://arxiv.org/abs/2310.02897

Similar Items