Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Venkatesh, Sohan, Kurapath, Ashish Mahendran, Melkote, Tejas
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2602.21947
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908938686955520
author	Venkatesh, Sohan Kurapath, Ashish Mahendran Melkote, Tejas
author_facet	Venkatesh, Sohan Kurapath, Ashish Mahendran Melkote, Tejas
contents	Large language models (LLMs) demonstrate remarkable breadth of knowledge, yet their ability to reason about computational processes remains poorly understood. Closing this gap matters for practitioners who rely on LLMs to guide algorithm selection and deployment. We address this limitation using causal discovery as a testbed and evaluate eight frontier LLMs against ground truth derived from algorithm executions. We find systematic, near-total failure across models. The predicted ranges are far wider than true confidence intervals yet still fail to contain the true algorithmic mean in most cases. Most models perform worse than random guessing and the best model's marginal improvement is attributable to benchmark memorization rather than principled reasoning. We term this failure algorithmic blindness and argue it reflects a fundamental gap between declarative knowledge about algorithms and calibrated procedural prediction.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_21947
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Large Language Models are Algorithmically Blind Venkatesh, Sohan Kurapath, Ashish Mahendran Melkote, Tejas Computation and Language Large language models (LLMs) demonstrate remarkable breadth of knowledge, yet their ability to reason about computational processes remains poorly understood. Closing this gap matters for practitioners who rely on LLMs to guide algorithm selection and deployment. We address this limitation using causal discovery as a testbed and evaluate eight frontier LLMs against ground truth derived from algorithm executions. We find systematic, near-total failure across models. The predicted ranges are far wider than true confidence intervals yet still fail to contain the true algorithmic mean in most cases. Most models perform worse than random guessing and the best model's marginal improvement is attributable to benchmark memorization rather than principled reasoning. We term this failure algorithmic blindness and argue it reflects a fundamental gap between declarative knowledge about algorithms and calibrated procedural prediction.
title	Large Language Models are Algorithmically Blind
topic	Computation and Language
url	https://arxiv.org/abs/2602.21947

Similar Items