Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Wei, Xu, Haomei, Liu, Bingqing, Deng, Zhiying, Wang, Haozhao, Wang, Jun, Li, Ruixuan, Teh, Yee Whye, Lee, Wee Sun
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.00625
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912620888457216
author	Liu, Wei Xu, Haomei Liu, Bingqing Deng, Zhiying Wang, Haozhao Wang, Jun Li, Ruixuan Teh, Yee Whye Lee, Wee Sun
author_facet	Liu, Wei Xu, Haomei Liu, Bingqing Deng, Zhiying Wang, Haozhao Wang, Jun Li, Ruixuan Teh, Yee Whye Lee, Wee Sun
contents	Large language models (LLMs) inevitably encode outdated or incorrect knowledge. Updating, deleting, and forgetting such knowledge is important for alignment, safety, and other issues. To address this issue, model editing has emerged as a promising paradigm: by precisely editing a small subset of parameters such that a specific fact is updated while preserving other knowledge. Despite its great success reported in previous papers, we find the apparent reliability of editing rests on a fragile foundation and the current literature is largely driven by illusory success. The fundamental goal of steering the model's output toward a target with minimal modification would encourage exploiting hidden shortcuts, rather than utilizing real semantics. This problem directly challenges the feasibility of the current model editing literature at its very foundation, as shortcuts are inherently at odds with robust knowledge integration. Coincidentally, this issue has long been obscured by evaluation frameworks that lack the design of negative examples. To uncover it, we systematically develop a suite of new evaluation methods. Strikingly, we find that state-of-the-art approaches collapse even under the simplest negation queries. Our empirical evidence shows that editing is likely to be based on shortcuts rather than full semantics, calling for an urgent reconsideration of the very basis of model editing before further advancements can be meaningfully pursued.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_00625
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Is Model Editing Built on Sand? Revealing Its Illusory Success and Fragile Foundation Liu, Wei Xu, Haomei Liu, Bingqing Deng, Zhiying Wang, Haozhao Wang, Jun Li, Ruixuan Teh, Yee Whye Lee, Wee Sun Artificial Intelligence Large language models (LLMs) inevitably encode outdated or incorrect knowledge. Updating, deleting, and forgetting such knowledge is important for alignment, safety, and other issues. To address this issue, model editing has emerged as a promising paradigm: by precisely editing a small subset of parameters such that a specific fact is updated while preserving other knowledge. Despite its great success reported in previous papers, we find the apparent reliability of editing rests on a fragile foundation and the current literature is largely driven by illusory success. The fundamental goal of steering the model's output toward a target with minimal modification would encourage exploiting hidden shortcuts, rather than utilizing real semantics. This problem directly challenges the feasibility of the current model editing literature at its very foundation, as shortcuts are inherently at odds with robust knowledge integration. Coincidentally, this issue has long been obscured by evaluation frameworks that lack the design of negative examples. To uncover it, we systematically develop a suite of new evaluation methods. Strikingly, we find that state-of-the-art approaches collapse even under the simplest negation queries. Our empirical evidence shows that editing is likely to be based on shortcuts rather than full semantics, calling for an urgent reconsideration of the very basis of model editing before further advancements can be meaningfully pursued.
title	Is Model Editing Built on Sand? Revealing Its Illusory Success and Fragile Foundation
topic	Artificial Intelligence
url	https://arxiv.org/abs/2510.00625

Similar Items