Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Niwa, Ayana, Kaneko, Masahiro, Inui, Kentaro
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2502.20620
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908410158514176
author	Niwa, Ayana Kaneko, Masahiro Inui, Kentaro
author_facet	Niwa, Ayana Kaneko, Masahiro Inui, Kentaro
contents	Large language models (LLMs) can exhibit advanced reasoning yet still generate incorrect answers. We hypothesize that such errors frequently stem from spurious beliefs, propositions the model internally considers true but are incorrect. To address this, we propose a method to rectify the belief space by suppressing these spurious beliefs while simultaneously enhancing true ones, thereby enabling more reliable inferences. Our approach first identifies the beliefs that lead to incorrect or correct answers by prompting the model to generate textual explanations, using our Forward-Backward Beam Search (FBBS). We then apply unlearning to suppress the identified spurious beliefs and enhance the true ones, effectively rectifying the model's belief space. Empirical results on multiple QA datasets and LLMs show that our method corrects previously misanswered questions without harming overall model performance. Furthermore, our approach yields improved generalization on unseen data, suggesting that rectifying a model's belief space is a promising direction for mitigating errors and enhancing overall reliability.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_20620
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Rectifying Belief Space via Unlearning to Harness LLMs' Reasoning Niwa, Ayana Kaneko, Masahiro Inui, Kentaro Computation and Language Large language models (LLMs) can exhibit advanced reasoning yet still generate incorrect answers. We hypothesize that such errors frequently stem from spurious beliefs, propositions the model internally considers true but are incorrect. To address this, we propose a method to rectify the belief space by suppressing these spurious beliefs while simultaneously enhancing true ones, thereby enabling more reliable inferences. Our approach first identifies the beliefs that lead to incorrect or correct answers by prompting the model to generate textual explanations, using our Forward-Backward Beam Search (FBBS). We then apply unlearning to suppress the identified spurious beliefs and enhance the true ones, effectively rectifying the model's belief space. Empirical results on multiple QA datasets and LLMs show that our method corrects previously misanswered questions without harming overall model performance. Furthermore, our approach yields improved generalization on unseen data, suggesting that rectifying a model's belief space is a promising direction for mitigating errors and enhancing overall reliability.
title	Rectifying Belief Space via Unlearning to Harness LLMs' Reasoning
topic	Computation and Language
url	https://arxiv.org/abs/2502.20620

Similar Items