Saved in:
Bibliographic Details
Main Authors: Dalal, Siddhartha, Misra, Vishal
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.03175
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916407843749888
author Dalal, Siddhartha
Misra, Vishal
author_facet Dalal, Siddhartha
Misra, Vishal
contents This paper introduces a novel Bayesian learning model to explain the behavior of Large Language Models (LLMs), focusing on their core optimization metric of next token prediction. We develop a theoretical framework based on an ideal generative text model represented by a multinomial transition probability matrix with a prior, and examine how LLMs approximate this matrix. Key contributions include: (i) a continuity theorem relating embeddings to multinomial distributions, (ii) a demonstration that LLM text generation aligns with Bayesian learning principles, (iii) an explanation for the emergence of in-context learning in larger models, (iv) empirical validation using visualizations of next token probabilities from an instrumented Llama model Our findings provide new insights into LLM functioning, offering a statistical foundation for understanding their capabilities and limitations. This framework has implications for LLM design, training, and application, potentially guiding future developments in the field.
format Preprint
id arxiv_https___arxiv_org_abs_2402_03175
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Beyond the Black Box: A Statistical Model for LLM Reasoning and Inference
Dalal, Siddhartha
Misra, Vishal
Machine Learning
Artificial Intelligence
I.2.7
This paper introduces a novel Bayesian learning model to explain the behavior of Large Language Models (LLMs), focusing on their core optimization metric of next token prediction. We develop a theoretical framework based on an ideal generative text model represented by a multinomial transition probability matrix with a prior, and examine how LLMs approximate this matrix. Key contributions include: (i) a continuity theorem relating embeddings to multinomial distributions, (ii) a demonstration that LLM text generation aligns with Bayesian learning principles, (iii) an explanation for the emergence of in-context learning in larger models, (iv) empirical validation using visualizations of next token probabilities from an instrumented Llama model Our findings provide new insights into LLM functioning, offering a statistical foundation for understanding their capabilities and limitations. This framework has implications for LLM design, training, and application, potentially guiding future developments in the field.
title Beyond the Black Box: A Statistical Model for LLM Reasoning and Inference
topic Machine Learning
Artificial Intelligence
I.2.7
url https://arxiv.org/abs/2402.03175