Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Block, Jacob L., Mohri, Mehryar, Mokhtari, Aryan, Shakkottai, Sanjay
Formato:	Preprint
Publicado:	2026
Materias:	Machine Learning
Acceso en línea:	https://arxiv.org/abs/2602.10217
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866915790346780672
author	Block, Jacob L. Mohri, Mehryar Mokhtari, Aryan Shakkottai, Sanjay
author_facet	Block, Jacob L. Mohri, Mehryar Mokhtari, Aryan Shakkottai, Sanjay
contents	We study machine unlearning in large generative models by framing the task as density ratio estimation to a target distribution rather than supervised fine-tuning. While classifier guidance is a standard approach for approximating this ratio and can succeed in general, we show it can fail to faithfully unlearn with finite samples when the forget set represents a sharp, concentrated data distribution. To address this, we introduce Temper-Then-Tilt Unlearning (T3-Unlearning), which freezes the base model and applies a two-step inference procedure: (i) tempering the base distribution to flatten high-confidence spikes, and (ii) tilting the tempered distribution using a lightweight classifier trained to distinguish retain from forget samples. Our theoretical analysis provides finite-sample guarantees linking the surrogate classifier's risk to unlearning error, proving that tempering is necessary to successfully unlearn for concentrated distributions. Empirical evaluations on the TOFU benchmark show that T3-Unlearning improves forget quality and generative utility over existing baselines, while training only a fraction of the parameters with a minimal runtime.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_10217
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Temper-Then-Tilt: Principled Unlearning for Generative Models through Tempering and Classifier Guidance Block, Jacob L. Mohri, Mehryar Mokhtari, Aryan Shakkottai, Sanjay Machine Learning We study machine unlearning in large generative models by framing the task as density ratio estimation to a target distribution rather than supervised fine-tuning. While classifier guidance is a standard approach for approximating this ratio and can succeed in general, we show it can fail to faithfully unlearn with finite samples when the forget set represents a sharp, concentrated data distribution. To address this, we introduce Temper-Then-Tilt Unlearning (T3-Unlearning), which freezes the base model and applies a two-step inference procedure: (i) tempering the base distribution to flatten high-confidence spikes, and (ii) tilting the tempered distribution using a lightweight classifier trained to distinguish retain from forget samples. Our theoretical analysis provides finite-sample guarantees linking the surrogate classifier's risk to unlearning error, proving that tempering is necessary to successfully unlearn for concentrated distributions. Empirical evaluations on the TOFU benchmark show that T3-Unlearning improves forget quality and generative utility over existing baselines, while training only a fraction of the parameters with a minimal runtime.
title	Temper-Then-Tilt: Principled Unlearning for Generative Models through Tempering and Classifier Guidance
topic	Machine Learning
url	https://arxiv.org/abs/2602.10217

Ejemplares similares