Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Xuanming, Ashrafi, Shwan, Mirsaidova, Aziza, Rezaeian, Amir H., Ballesteros, Miguel, Chilton, Lydia B., Yu, Zhou, Roth, Dan
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2601.11038
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917419509874688
author	Zhang, Xuanming Ashrafi, Shwan Mirsaidova, Aziza Rezaeian, Amir H. Ballesteros, Miguel Chilton, Lydia B. Yu, Zhou Roth, Dan
author_facet	Zhang, Xuanming Ashrafi, Shwan Mirsaidova, Aziza Rezaeian, Amir H. Ballesteros, Miguel Chilton, Lydia B. Yu, Zhou Roth, Dan
contents	We study the reasoning behavior of large language models (LLMs) under limited computation budgets. In such settings, producing useful partial solutions quickly is often more practical than exhaustive reasoning, which incurs high inference costs. Many real-world tasks, such as trip planning, require models to deliver the best possible output within a fixed reasoning budget. We introduce an anytime reasoning framework and the Anytime Index, a metric that quantifies how effectively solution quality improves as reasoning tokens increase. To further enhance efficiency, we propose an inference-time self-improvement method using LLM-synthesized preference data, where models learn from their own reasoning comparisons to produce better intermediate solutions. Experiments on NaturalPlan (Trip), AIME, and GPQA datasets show consistent gains across Grok-3, GPT-oss, GPT-4.1/4o, and LLaMA models, improving both reasoning quality and efficiency under budget constraints.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_11038
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data Zhang, Xuanming Ashrafi, Shwan Mirsaidova, Aziza Rezaeian, Amir H. Ballesteros, Miguel Chilton, Lydia B. Yu, Zhou Roth, Dan Computation and Language We study the reasoning behavior of large language models (LLMs) under limited computation budgets. In such settings, producing useful partial solutions quickly is often more practical than exhaustive reasoning, which incurs high inference costs. Many real-world tasks, such as trip planning, require models to deliver the best possible output within a fixed reasoning budget. We introduce an anytime reasoning framework and the Anytime Index, a metric that quantifies how effectively solution quality improves as reasoning tokens increase. To further enhance efficiency, we propose an inference-time self-improvement method using LLM-synthesized preference data, where models learn from their own reasoning comparisons to produce better intermediate solutions. Experiments on NaturalPlan (Trip), AIME, and GPQA datasets show consistent gains across Grok-3, GPT-oss, GPT-4.1/4o, and LLaMA models, improving both reasoning quality and efficiency under budget constraints.
title	Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data
topic	Computation and Language
url	https://arxiv.org/abs/2601.11038

Similar Items