Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Sun, Xinhai
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2602.08520
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912898510487552
author	Sun, Xinhai
author_facet	Sun, Xinhai
contents	Modern large language models (LLMs) are often evaluated and deployed under a one-shot, greedy inference protocol, especially in professional settings that require deterministic behavior. This regime can systematically under-estimate a fixed model's true capability: many errors arise not from missing knowledge, but from premature commitment under internal ambiguity. We introduce Reinforcement Inference, an entropy-aware inference-time control strategy that uses the model's own uncertainty to selectively invoke a second, more deliberate reasoning attempt, enabling stronger performance without any retraining. On 12,032 MMLU-Pro questions across 14 subjects, using DeepSeek-v3.2 with deterministic decoding in a zero-shot setting, Reinforcement Inference improves accuracy from 60.72% to 84.03%, while only incurring 61.06% additional inference calls. A 100% re-asking ablation reaches 84.35%, indicating that uncertainty-aware selection captures most of the attainable improvement with substantially less compute. Moreover, a prompt-only ablation underperforms the baseline, suggesting that the gains are not explained by generic prompting alone. Beyond providing a practical inference-time upgrade, our results suggest a broader entropy-aware paradigm for measuring and expanding model capability: because modern decoder-based models generate outputs autoregressively, entropy and related confidence measures arise naturally as first-class control signals during generation. The resulting gap between one-pass greedy inference and uncertainty-conditioned deliberation offers a diagnostic lens on an LLM's latent reasoning horizon and motivates future training objectives that explicitly constrain correctness--confidence alignment.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_08520
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Reinforcement Inference: Leveraging Uncertainty for Self-Correcting Language Model Reasoning Sun, Xinhai Artificial Intelligence Machine Learning Modern large language models (LLMs) are often evaluated and deployed under a one-shot, greedy inference protocol, especially in professional settings that require deterministic behavior. This regime can systematically under-estimate a fixed model's true capability: many errors arise not from missing knowledge, but from premature commitment under internal ambiguity. We introduce Reinforcement Inference, an entropy-aware inference-time control strategy that uses the model's own uncertainty to selectively invoke a second, more deliberate reasoning attempt, enabling stronger performance without any retraining. On 12,032 MMLU-Pro questions across 14 subjects, using DeepSeek-v3.2 with deterministic decoding in a zero-shot setting, Reinforcement Inference improves accuracy from 60.72% to 84.03%, while only incurring 61.06% additional inference calls. A 100% re-asking ablation reaches 84.35%, indicating that uncertainty-aware selection captures most of the attainable improvement with substantially less compute. Moreover, a prompt-only ablation underperforms the baseline, suggesting that the gains are not explained by generic prompting alone. Beyond providing a practical inference-time upgrade, our results suggest a broader entropy-aware paradigm for measuring and expanding model capability: because modern decoder-based models generate outputs autoregressively, entropy and related confidence measures arise naturally as first-class control signals during generation. The resulting gap between one-pass greedy inference and uncertainty-conditioned deliberation offers a diagnostic lens on an LLM's latent reasoning horizon and motivates future training objectives that explicitly constrain correctness--confidence alignment.
title	Reinforcement Inference: Leveraging Uncertainty for Self-Correcting Language Model Reasoning
topic	Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2602.08520

Similar Items