Saved in:
Bibliographic Details
Main Authors: Niwa, Ayana, Kaneko, Masahiro, Inui, Kentaro
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.20620
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908410158514176
author Niwa, Ayana
Kaneko, Masahiro
Inui, Kentaro
author_facet Niwa, Ayana
Kaneko, Masahiro
Inui, Kentaro
contents Large language models (LLMs) can exhibit advanced reasoning yet still generate incorrect answers. We hypothesize that such errors frequently stem from spurious beliefs, propositions the model internally considers true but are incorrect. To address this, we propose a method to rectify the belief space by suppressing these spurious beliefs while simultaneously enhancing true ones, thereby enabling more reliable inferences. Our approach first identifies the beliefs that lead to incorrect or correct answers by prompting the model to generate textual explanations, using our Forward-Backward Beam Search (FBBS). We then apply unlearning to suppress the identified spurious beliefs and enhance the true ones, effectively rectifying the model's belief space. Empirical results on multiple QA datasets and LLMs show that our method corrects previously misanswered questions without harming overall model performance. Furthermore, our approach yields improved generalization on unseen data, suggesting that rectifying a model's belief space is a promising direction for mitigating errors and enhancing overall reliability.
format Preprint
id arxiv_https___arxiv_org_abs_2502_20620
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Rectifying Belief Space via Unlearning to Harness LLMs' Reasoning
Niwa, Ayana
Kaneko, Masahiro
Inui, Kentaro
Computation and Language
Large language models (LLMs) can exhibit advanced reasoning yet still generate incorrect answers. We hypothesize that such errors frequently stem from spurious beliefs, propositions the model internally considers true but are incorrect. To address this, we propose a method to rectify the belief space by suppressing these spurious beliefs while simultaneously enhancing true ones, thereby enabling more reliable inferences. Our approach first identifies the beliefs that lead to incorrect or correct answers by prompting the model to generate textual explanations, using our Forward-Backward Beam Search (FBBS). We then apply unlearning to suppress the identified spurious beliefs and enhance the true ones, effectively rectifying the model's belief space. Empirical results on multiple QA datasets and LLMs show that our method corrects previously misanswered questions without harming overall model performance. Furthermore, our approach yields improved generalization on unseen data, suggesting that rectifying a model's belief space is a promising direction for mitigating errors and enhancing overall reliability.
title Rectifying Belief Space via Unlearning to Harness LLMs' Reasoning
topic Computation and Language
url https://arxiv.org/abs/2502.20620