Saved in:
Bibliographic Details
Main Authors: Gusmão, Kin Max Piamolini, Gavenski, Nathan, Oren, Nir, Meneguzzi, Felipe
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.15333
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Large language models have recently reached near-parity with classical planners on well-known planning domains, yet this competence relies on world-knowledge exploitation rather than genuine symbolic reasoning. Goal recognition is a complementary abductive task structurally better suited to LLM strengths: it consists of evaluating consistency with world knowledge rather than generating novel action sequences. This paper provides the first systematic zero-shot evaluation of frontier LLMs as goal recognisers on key classical PDDL benchmarks. Our results show that LLM competence on goal recognition is uneven: some models scale with evidence and approach landmark-based accuracy at full observations, while others remain anchored to world-knowledge priors regardless of how much evidence accumulates. Qualitative analysis of model reasoning traces reveals that this divergence reflects a fundamental difference in evidence integration rather than domain familiarity. These findings position goal recognition as a principled benchmark for the foundational planning knowledge of LLMs.