Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Kongyang, Wang, Zixin, Mi, Bing, Liu, Waixi, Wang, Shaowei, Ren, Xiaojun, Shen, Jiaxing
Format:	Preprint
Published:	2024
Subjects:	Cryptography and Security
Online Access:	https://arxiv.org/abs/2404.16841
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909182774476800
author	Chen, Kongyang Wang, Zixin Mi, Bing Liu, Waixi Wang, Shaowei Ren, Xiaojun Shen, Jiaxing
author_facet	Chen, Kongyang Wang, Zixin Mi, Bing Liu, Waixi Wang, Shaowei Ren, Xiaojun Shen, Jiaxing
contents	Recently, large language models (LLMs) have emerged as a notable field, attracting significant attention for its ability to automatically generate intelligent contents for various application domains. However, LLMs still suffer from significant security and privacy issues. For example, LLMs might expose user privacy from hacking attacks or targeted prompts. To address this problem, this paper introduces a novel machine unlearning framework into LLMs. Our objectives are to make LLMs not produce harmful, hallucinatory, or privacy-compromising responses, while retaining their standard output capabilities. To accomplish this, we use an evaluative model to pinpoint dialogues needing unlearning. We also establish a distance loss to function as the model's negative loss, diverting it from previous undesirable outputs. Furthermore, we determine the expected output's cluster mean to formulate a positive loss, directing the model's outputs toward preferable outcomes without compromising its reasoning abilities and performance. Experimental results show that our approach effectively meets unlearning objectives without substantially compromising model performance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_16841
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Machine Unlearning in Large Language Models Chen, Kongyang Wang, Zixin Mi, Bing Liu, Waixi Wang, Shaowei Ren, Xiaojun Shen, Jiaxing Cryptography and Security Recently, large language models (LLMs) have emerged as a notable field, attracting significant attention for its ability to automatically generate intelligent contents for various application domains. However, LLMs still suffer from significant security and privacy issues. For example, LLMs might expose user privacy from hacking attacks or targeted prompts. To address this problem, this paper introduces a novel machine unlearning framework into LLMs. Our objectives are to make LLMs not produce harmful, hallucinatory, or privacy-compromising responses, while retaining their standard output capabilities. To accomplish this, we use an evaluative model to pinpoint dialogues needing unlearning. We also establish a distance loss to function as the model's negative loss, diverting it from previous undesirable outputs. Furthermore, we determine the expected output's cluster mean to formulate a positive loss, directing the model's outputs toward preferable outcomes without compromising its reasoning abilities and performance. Experimental results show that our approach effectively meets unlearning objectives without substantially compromising model performance.
title	Machine Unlearning in Large Language Models
topic	Cryptography and Security
url	https://arxiv.org/abs/2404.16841

Similar Items