Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	McKinney, Lev, Thudi, Anvith, Bae, Juhan, Rezaei, Tara, Papernot, Nicolas, McIlraith, Sheila A., Grosse, Roger
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.10568
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910018799927296
author	McKinney, Lev Thudi, Anvith Bae, Juhan Rezaei, Tara Papernot, Nicolas McIlraith, Sheila A. Grosse, Roger
author_facet	McKinney, Lev Thudi, Anvith Bae, Juhan Rezaei, Tara Papernot, Nicolas McIlraith, Sheila A. Grosse, Roger
contents	Standard large language model training can create models that produce outputs their trainer deems unacceptable in deployment. The probability of these outputs can be reduced using methods such as LLM unlearning. However, unlearning a set of data (called the forget set) can degrade model performance on other distributions where the trainer wants to retain the model's behavior. To improve this trade-off, we demonstrate that using the forget set to compute only a few uphill Gauss-Newton steps provides a conceptually simple, state-of-the-art unlearning approach for LLMs. While Gauss-Newton steps adapt Newton's method to non-linear models, it is non-trivial to efficiently and accurately compute such steps for LLMs. Hence, our approach crucially relies on parametric Hessian approximations such as Kronecker-Factored Approximate Curvature (K-FAC). We call this combined approach K-FADE (K-FAC for Distribution Erasure). Our evaluation on the WMDP and ToFU benchmarks demonstrates that K-FADE suppresses outputs from the forget set and approximates, in output space, the results of retraining without the forget set. Critically, our method does this while altering the outputs on the retain set less than previous methods. This is because K-FADE transforms a constraint on the model's outputs across the entire retain set into a constraint on the model's weights, allowing the algorithm to minimally change the model's behavior on the retain set at each step. Moreover, the unlearning updates computed by K-FADE can be reapplied later if the model undergoes further training, allowing unlearning to be cheaply maintained.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_10568
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Gauss-Newton Unlearning for the LLM Era McKinney, Lev Thudi, Anvith Bae, Juhan Rezaei, Tara Papernot, Nicolas McIlraith, Sheila A. Grosse, Roger Machine Learning Standard large language model training can create models that produce outputs their trainer deems unacceptable in deployment. The probability of these outputs can be reduced using methods such as LLM unlearning. However, unlearning a set of data (called the forget set) can degrade model performance on other distributions where the trainer wants to retain the model's behavior. To improve this trade-off, we demonstrate that using the forget set to compute only a few uphill Gauss-Newton steps provides a conceptually simple, state-of-the-art unlearning approach for LLMs. While Gauss-Newton steps adapt Newton's method to non-linear models, it is non-trivial to efficiently and accurately compute such steps for LLMs. Hence, our approach crucially relies on parametric Hessian approximations such as Kronecker-Factored Approximate Curvature (K-FAC). We call this combined approach K-FADE (K-FAC for Distribution Erasure). Our evaluation on the WMDP and ToFU benchmarks demonstrates that K-FADE suppresses outputs from the forget set and approximates, in output space, the results of retraining without the forget set. Critically, our method does this while altering the outputs on the retain set less than previous methods. This is because K-FADE transforms a constraint on the model's outputs across the entire retain set into a constraint on the model's weights, allowing the algorithm to minimally change the model's behavior on the retain set at each step. Moreover, the unlearning updates computed by K-FADE can be reapplied later if the model undergoes further training, allowing unlearning to be cheaply maintained.
title	Gauss-Newton Unlearning for the LLM Era
topic	Machine Learning
url	https://arxiv.org/abs/2602.10568

Similar Items