Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Qiu, Yiwen, Wu, Linjuan, Liu, Yizhou, Yan, Yuchen, Ma, Jin, Tan, Xu, Hu, Yao, Zhang, Daoxin, Zhang, Wenqi, Lu, Weiming, Xiao, Jun, Shen, Yongliang
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Computation and Language
Online-Zugang:	https://arxiv.org/abs/2604.19656
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866913051949662208
author	Qiu, Yiwen Wu, Linjuan Liu, Yizhou Yan, Yuchen Ma, Jin Tan, Xu Hu, Yao Zhang, Daoxin Zhang, Wenqi Lu, Weiming Xiao, Jun Shen, Yongliang
author_facet	Qiu, Yiwen Wu, Linjuan Liu, Yizhou Yan, Yuchen Ma, Jin Tan, Xu Hu, Yao Zhang, Daoxin Zhang, Wenqi Lu, Weiming Xiao, Jun Shen, Yongliang
contents	Large language models have achieved remarkable progress on complex reasoning tasks. However, they often implicitly fabricate information when inputs are incomplete, producing confident but unreliable conclusions -- a failure mode we term ungrounded reasoning. We argue that this issue arises not from insufficient reasoning capability, but from the lack of inferential boundary awareness -- the ability to recognize when the necessary premises for valid inference are missing. To address this issue, we propose Grounded Reasoning via Interactive Reinforcement Learning (GRIL), a multi-turn reinforcement learning framework for grounded reasoning under incomplete information. GRIL decomposes the reasoning process into two stages: clarify and pause, which identifies whether the available information is sufficient, and grounded reasoning, which performs task solving once the necessary premises are established. We design stage-specific rewards to penalize hallucinations, enabling models to detect gaps, stop proactively, and resume reasoning after clarification. Experiments on GSM8K-Insufficient and MetaMATH-Insufficient show that GRIL significantly improves premise detection (up to 45%), leading to a 30% increase in task success while reducing average response length by over 20%. Additional analyses confirm robustness to noisy user responses and generalization to out-of-distribution tasks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_19656
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Pause or Fabricate? Training Language Models for Grounded Reasoning Qiu, Yiwen Wu, Linjuan Liu, Yizhou Yan, Yuchen Ma, Jin Tan, Xu Hu, Yao Zhang, Daoxin Zhang, Wenqi Lu, Weiming Xiao, Jun Shen, Yongliang Computation and Language Large language models have achieved remarkable progress on complex reasoning tasks. However, they often implicitly fabricate information when inputs are incomplete, producing confident but unreliable conclusions -- a failure mode we term ungrounded reasoning. We argue that this issue arises not from insufficient reasoning capability, but from the lack of inferential boundary awareness -- the ability to recognize when the necessary premises for valid inference are missing. To address this issue, we propose Grounded Reasoning via Interactive Reinforcement Learning (GRIL), a multi-turn reinforcement learning framework for grounded reasoning under incomplete information. GRIL decomposes the reasoning process into two stages: clarify and pause, which identifies whether the available information is sufficient, and grounded reasoning, which performs task solving once the necessary premises are established. We design stage-specific rewards to penalize hallucinations, enabling models to detect gaps, stop proactively, and resume reasoning after clarification. Experiments on GSM8K-Insufficient and MetaMATH-Insufficient show that GRIL significantly improves premise detection (up to 45%), leading to a 30% increase in task success while reducing average response length by over 20%. Additional analyses confirm robustness to noisy user responses and generalization to out-of-distribution tasks.
title	Pause or Fabricate? Training Language Models for Grounded Reasoning
topic	Computation and Language
url	https://arxiv.org/abs/2604.19656

Ähnliche Einträge