תצוגת צוות: :: Library Catalog

שמור ב:

מידע ביבליוגרפי
Main Authors:	Ji, Jiabao, Liu, Yujian, Zhang, Yang, Liu, Gaowen, Kompella, Ramana Rao, Liu, Sijia, Chang, Shiyu
פורמט:	Preprint
יצא לאור:	2024
נושאים:	Computation and Language Artificial Intelligence
גישה מקוונת:	https://arxiv.org/abs/2406.08607
תגים:	הוספת תג אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!

_version_	1866911915578490880
author	Ji, Jiabao Liu, Yujian Zhang, Yang Liu, Gaowen Kompella, Ramana Rao Liu, Sijia Chang, Shiyu
author_facet	Ji, Jiabao Liu, Yujian Zhang, Yang Liu, Gaowen Kompella, Ramana Rao Liu, Sijia Chang, Shiyu
contents	As Large Language Models (LLMs) demonstrate extensive capability in learning from documents, LLM unlearning becomes an increasingly important research area to address concerns of LLMs in terms of privacy, copyright, etc. A conventional LLM unlearning task typically involves two goals: (1) The target LLM should forget the knowledge in the specified forget documents, and (2) it should retain the other knowledge that the LLM possesses, for which we assume access to a small number of retain documents. To achieve both goals, a mainstream class of LLM unlearning methods introduces an optimization framework with a combination of two objectives - maximizing the prediction loss on the forget documents while minimizing that on the retain documents, which suffers from two challenges, degenerated output and catastrophic forgetting. In this paper, we propose a novel unlearning framework called Unlearning from Logit Difference (ULD), which introduces an assistant LLM that aims to achieve the opposite of the unlearning goals: remembering the forget documents and forgetting the retain knowledge. ULD then derives the unlearned LLM by computing the logit difference between the target and the assistant LLMs. We show that such reversed objectives would naturally resolve both aforementioned challenges while significantly improving the training efficiency. Extensive experiments demonstrate that our method efficiently achieves the intended forgetting while preserving the LLM's overall capabilities, reducing training time by more than threefold. Notably, our method loses 0% of model utility on the ToFU benchmark, whereas baseline methods may sacrifice 17% of utility on average to achieve comparable forget quality. Our code will be publicly available at https://github.com/UCSB-NLP-Chang/ULD.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_08607
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference Ji, Jiabao Liu, Yujian Zhang, Yang Liu, Gaowen Kompella, Ramana Rao Liu, Sijia Chang, Shiyu Computation and Language Artificial Intelligence As Large Language Models (LLMs) demonstrate extensive capability in learning from documents, LLM unlearning becomes an increasingly important research area to address concerns of LLMs in terms of privacy, copyright, etc. A conventional LLM unlearning task typically involves two goals: (1) The target LLM should forget the knowledge in the specified forget documents, and (2) it should retain the other knowledge that the LLM possesses, for which we assume access to a small number of retain documents. To achieve both goals, a mainstream class of LLM unlearning methods introduces an optimization framework with a combination of two objectives - maximizing the prediction loss on the forget documents while minimizing that on the retain documents, which suffers from two challenges, degenerated output and catastrophic forgetting. In this paper, we propose a novel unlearning framework called Unlearning from Logit Difference (ULD), which introduces an assistant LLM that aims to achieve the opposite of the unlearning goals: remembering the forget documents and forgetting the retain knowledge. ULD then derives the unlearned LLM by computing the logit difference between the target and the assistant LLMs. We show that such reversed objectives would naturally resolve both aforementioned challenges while significantly improving the training efficiency. Extensive experiments demonstrate that our method efficiently achieves the intended forgetting while preserving the LLM's overall capabilities, reducing training time by more than threefold. Notably, our method loses 0% of model utility on the ToFU benchmark, whereas baseline methods may sacrifice 17% of utility on average to achieve comparable forget quality. Our code will be publicly available at https://github.com/UCSB-NLP-Chang/ULD.
title	Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2406.08607

פריטים דומים