Saved in:
Bibliographic Details
Main Author: Sun, Xinhai
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.08520
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912898510487552
author Sun, Xinhai
author_facet Sun, Xinhai
contents Modern large language models (LLMs) are often evaluated and deployed under a one-shot, greedy inference protocol, especially in professional settings that require deterministic behavior. This regime can systematically under-estimate a fixed model's true capability: many errors arise not from missing knowledge, but from premature commitment under internal ambiguity. We introduce Reinforcement Inference, an entropy-aware inference-time control strategy that uses the model's own uncertainty to selectively invoke a second, more deliberate reasoning attempt, enabling stronger performance without any retraining. On 12,032 MMLU-Pro questions across 14 subjects, using DeepSeek-v3.2 with deterministic decoding in a zero-shot setting, Reinforcement Inference improves accuracy from 60.72% to 84.03%, while only incurring 61.06% additional inference calls. A 100% re-asking ablation reaches 84.35%, indicating that uncertainty-aware selection captures most of the attainable improvement with substantially less compute. Moreover, a prompt-only ablation underperforms the baseline, suggesting that the gains are not explained by generic prompting alone. Beyond providing a practical inference-time upgrade, our results suggest a broader entropy-aware paradigm for measuring and expanding model capability: because modern decoder-based models generate outputs autoregressively, entropy and related confidence measures arise naturally as first-class control signals during generation. The resulting gap between one-pass greedy inference and uncertainty-conditioned deliberation offers a diagnostic lens on an LLM's latent reasoning horizon and motivates future training objectives that explicitly constrain correctness--confidence alignment.
format Preprint
id arxiv_https___arxiv_org_abs_2602_08520
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Reinforcement Inference: Leveraging Uncertainty for Self-Correcting Language Model Reasoning
Sun, Xinhai
Artificial Intelligence
Machine Learning
Modern large language models (LLMs) are often evaluated and deployed under a one-shot, greedy inference protocol, especially in professional settings that require deterministic behavior. This regime can systematically under-estimate a fixed model's true capability: many errors arise not from missing knowledge, but from premature commitment under internal ambiguity. We introduce Reinforcement Inference, an entropy-aware inference-time control strategy that uses the model's own uncertainty to selectively invoke a second, more deliberate reasoning attempt, enabling stronger performance without any retraining. On 12,032 MMLU-Pro questions across 14 subjects, using DeepSeek-v3.2 with deterministic decoding in a zero-shot setting, Reinforcement Inference improves accuracy from 60.72% to 84.03%, while only incurring 61.06% additional inference calls. A 100% re-asking ablation reaches 84.35%, indicating that uncertainty-aware selection captures most of the attainable improvement with substantially less compute. Moreover, a prompt-only ablation underperforms the baseline, suggesting that the gains are not explained by generic prompting alone. Beyond providing a practical inference-time upgrade, our results suggest a broader entropy-aware paradigm for measuring and expanding model capability: because modern decoder-based models generate outputs autoregressively, entropy and related confidence measures arise naturally as first-class control signals during generation. The resulting gap between one-pass greedy inference and uncertainty-conditioned deliberation offers a diagnostic lens on an LLM's latent reasoning horizon and motivates future training objectives that explicitly constrain correctness--confidence alignment.
title Reinforcement Inference: Leveraging Uncertainty for Self-Correcting Language Model Reasoning
topic Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2602.08520