Saved in:
Bibliographic Details
Main Authors: Venkatesh, Sohan, Kurapath, Ashish Mahendran, Melkote, Tejas
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.21947
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908938686955520
author Venkatesh, Sohan
Kurapath, Ashish Mahendran
Melkote, Tejas
author_facet Venkatesh, Sohan
Kurapath, Ashish Mahendran
Melkote, Tejas
contents Large language models (LLMs) demonstrate remarkable breadth of knowledge, yet their ability to reason about computational processes remains poorly understood. Closing this gap matters for practitioners who rely on LLMs to guide algorithm selection and deployment. We address this limitation using causal discovery as a testbed and evaluate eight frontier LLMs against ground truth derived from algorithm executions. We find systematic, near-total failure across models. The predicted ranges are far wider than true confidence intervals yet still fail to contain the true algorithmic mean in most cases. Most models perform worse than random guessing and the best model's marginal improvement is attributable to benchmark memorization rather than principled reasoning. We term this failure algorithmic blindness and argue it reflects a fundamental gap between declarative knowledge about algorithms and calibrated procedural prediction.
format Preprint
id arxiv_https___arxiv_org_abs_2602_21947
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Large Language Models are Algorithmically Blind
Venkatesh, Sohan
Kurapath, Ashish Mahendran
Melkote, Tejas
Computation and Language
Large language models (LLMs) demonstrate remarkable breadth of knowledge, yet their ability to reason about computational processes remains poorly understood. Closing this gap matters for practitioners who rely on LLMs to guide algorithm selection and deployment. We address this limitation using causal discovery as a testbed and evaluate eight frontier LLMs against ground truth derived from algorithm executions. We find systematic, near-total failure across models. The predicted ranges are far wider than true confidence intervals yet still fail to contain the true algorithmic mean in most cases. Most models perform worse than random guessing and the best model's marginal improvement is attributable to benchmark memorization rather than principled reasoning. We term this failure algorithmic blindness and argue it reflects a fundamental gap between declarative knowledge about algorithms and calibrated procedural prediction.
title Large Language Models are Algorithmically Blind
topic Computation and Language
url https://arxiv.org/abs/2602.21947