Enregistré dans:
Détails bibliographiques
Auteur principal: Klein, Colin
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2508.18598
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
Table des matières:
  • What do large language models actually model? Do they tell us something about human capacities, or are they models of the corpus we've trained them on? I give a non-deflationary defence of the latter position. Cognitive science tells us that linguistic capabilities in humans rely supralinear formats for computation. The transformer architecture, by contrast, supports at best a linear formats for processing. This argument will rely primarily on certain invariants of the computational architecture of transformers. I then suggest a positive story about what transformers are doing, focusing on Liu et al. (2022)'s intriguing speculations about shortcut automata. I conclude with why I don't think this is a terribly deflationary story. Language is not (just) a means for expressing inner state but also a kind of 'discourse machine' that lets us make new language given appropriate context. We have learned to use this technology in one way; LLMs have also learned to use it too, but via very different means.