Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Xu, Hao-Xiang, Ma, Jun-Yu, Gu, Jia-Chen, Ling, Zhen-Hua, Liu, Quan, Liu, Cong
Formato:	Preprint
Publicado:	2023
Materias:	Computation and Language
Acceso en línea:	https://arxiv.org/abs/2310.10322
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866911735186718720
author	Xu, Hao-Xiang Ma, Jun-Yu Gu, Jia-Chen Ling, Zhen-Hua Liu, Quan Liu, Cong
author_facet	Xu, Hao-Xiang Ma, Jun-Yu Gu, Jia-Chen Ling, Zhen-Hua Liu, Quan Liu, Cong
contents	Large language models (LLMs) are prone to hallucinate unintended text due to false or outdated knowledge. Since retraining LLMs is resource intensive, there has been a growing interest in model editing. Despite the emergence of benchmarks and approaches, existing unidirectional editing and evaluation paradigms have failed to explore the reversal curse. In this paper, we study bidirectional language model editing, aiming to provide a rigorous evaluation to assess if edited LLMs can recall the editing knowledge bidirectionally. A metric of reverse generalization is introduced and a benchmark dubbed Bidirectional Assessment for Knowledge Editing (BAKE) is constructed to evaluate if post-edited models can recall the edited knowledge in the reverse direction of editing. We conduct extensive experiments using a variety of editing methods and LLMs. The results show that while most editing methods are able to accurately recall editing facts along the modification direction, they exhibit substantial systematic deficiencies when evaluating in the reverse direction. To further investigate the underlying causes of reversal curse and to explore potential strategies for mitigation, a detailed analysis is conducted from three perspectives. Our findings reveal that although In-Context Learning (ICL) can mitigate the reversal curse to a certain extent, it lacks continuity, is limited by the input length, and may introduce hallucinations. Therefore, combining the advantages of ICL and other editing methods is a promising direction for developing new editing paradigms.
format	Preprint
id	arxiv_https___arxiv_org_abs_2310_10322
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Evaluating the Reversal Curse in Model Editing Xu, Hao-Xiang Ma, Jun-Yu Gu, Jia-Chen Ling, Zhen-Hua Liu, Quan Liu, Cong Computation and Language Large language models (LLMs) are prone to hallucinate unintended text due to false or outdated knowledge. Since retraining LLMs is resource intensive, there has been a growing interest in model editing. Despite the emergence of benchmarks and approaches, existing unidirectional editing and evaluation paradigms have failed to explore the reversal curse. In this paper, we study bidirectional language model editing, aiming to provide a rigorous evaluation to assess if edited LLMs can recall the editing knowledge bidirectionally. A metric of reverse generalization is introduced and a benchmark dubbed Bidirectional Assessment for Knowledge Editing (BAKE) is constructed to evaluate if post-edited models can recall the edited knowledge in the reverse direction of editing. We conduct extensive experiments using a variety of editing methods and LLMs. The results show that while most editing methods are able to accurately recall editing facts along the modification direction, they exhibit substantial systematic deficiencies when evaluating in the reverse direction. To further investigate the underlying causes of reversal curse and to explore potential strategies for mitigation, a detailed analysis is conducted from three perspectives. Our findings reveal that although In-Context Learning (ICL) can mitigate the reversal curse to a certain extent, it lacks continuity, is limited by the input length, and may introduce hallucinations. Therefore, combining the advantages of ICL and other editing methods is a promising direction for developing new editing paradigms.
title	Evaluating the Reversal Curse in Model Editing
topic	Computation and Language
url	https://arxiv.org/abs/2310.10322

Ejemplares similares