Saved in:
Bibliographic Details
Main Authors: Downes, Stephen M., Forber, Patrick, Grzankowski, Alex
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.04666
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916350732009472
author Downes, Stephen M.
Forber, Patrick
Grzankowski, Alex
author_facet Downes, Stephen M.
Forber, Patrick
Grzankowski, Alex
contents LLMs are statistical models of language learning through stochastic gradient descent with a next token prediction objective. Prompting a popular view among AI modelers: LLMs are just next token predictors. While LLMs are engineered using next token prediction, and trained based on their success at this task, our view is that a reduction to just next token predictor sells LLMs short. Moreover, there are important explanations of LLM behavior and capabilities that are lost when we engage in this kind of reduction. In order to draw this out, we will make an analogy with a once prominent research program in biology explaining evolution and development from the gene's eye view.
format Preprint
id arxiv_https___arxiv_org_abs_2408_04666
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle LLMs are Not Just Next Token Predictors
Downes, Stephen M.
Forber, Patrick
Grzankowski, Alex
Computation and Language
Artificial Intelligence
LLMs are statistical models of language learning through stochastic gradient descent with a next token prediction objective. Prompting a popular view among AI modelers: LLMs are just next token predictors. While LLMs are engineered using next token prediction, and trained based on their success at this task, our view is that a reduction to just next token predictor sells LLMs short. Moreover, there are important explanations of LLM behavior and capabilities that are lost when we engage in this kind of reduction. In order to draw this out, we will make an analogy with a once prominent research program in biology explaining evolution and development from the gene's eye view.
title LLMs are Not Just Next Token Predictors
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2408.04666