Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lee, Celine, Rush, Alexander M., Vafa, Keyon
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2504.01935
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909562173390848
author	Lee, Celine Rush, Alexander M. Vafa, Keyon
author_facet	Lee, Celine Rush, Alexander M. Vafa, Keyon
contents	Large language models (LLMs) often benefit from verbalized reasoning at inference time, but it remains unclear which aspects of task difficulty these extra reasoning tokens address. To investigate this question, we formalize a framework using deterministic finite automata (DFAs). DFAs offer a formalism through which we can characterize task complexity through measurable properties such as run length (number of reasoning steps required) and state-space size (decision complexity). We first show that across different tasks and models of different sizes and training paradigms, there exists an optimal amount of reasoning tokens such that the probability of producing a correct solution is maximized. We then investigate which properties of complexity govern this critical length: we find that task instances with longer corresponding underlying DFA runs (i.e. demand greater latent state-tracking requirements) correlate with longer reasoning lengths, but, surprisingly, that DFA size (i.e. state-space complexity) does not. We then demonstrate an implication of these findings: being able to predict the optimal number of reasoning tokens for new problems and filtering out non-optimal length answers results in consistent accuracy improvements.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_01935
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length? Lee, Celine Rush, Alexander M. Vafa, Keyon Artificial Intelligence Large language models (LLMs) often benefit from verbalized reasoning at inference time, but it remains unclear which aspects of task difficulty these extra reasoning tokens address. To investigate this question, we formalize a framework using deterministic finite automata (DFAs). DFAs offer a formalism through which we can characterize task complexity through measurable properties such as run length (number of reasoning steps required) and state-space size (decision complexity). We first show that across different tasks and models of different sizes and training paradigms, there exists an optimal amount of reasoning tokens such that the probability of producing a correct solution is maximized. We then investigate which properties of complexity govern this critical length: we find that task instances with longer corresponding underlying DFA runs (i.e. demand greater latent state-tracking requirements) correlate with longer reasoning lengths, but, surprisingly, that DFA size (i.e. state-space complexity) does not. We then demonstrate an implication of these findings: being able to predict the optimal number of reasoning tokens for new problems and filtering out non-optimal length answers results in consistent accuracy improvements.
title	Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length?
topic	Artificial Intelligence
url	https://arxiv.org/abs/2504.01935

Similar Items