Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.03175 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866916407843749888 |
|---|---|
| author | Dalal, Siddhartha Misra, Vishal |
| author_facet | Dalal, Siddhartha Misra, Vishal |
| contents | This paper introduces a novel Bayesian learning model to explain the behavior of Large Language Models (LLMs), focusing on their core optimization metric of next token prediction. We develop a theoretical framework based on an ideal generative text model represented by a multinomial transition probability matrix with a prior, and examine how LLMs approximate this matrix. Key contributions include: (i) a continuity theorem relating embeddings to multinomial distributions, (ii) a demonstration that LLM text generation aligns with Bayesian learning principles, (iii) an explanation for the emergence of in-context learning in larger models, (iv) empirical validation using visualizations of next token probabilities from an instrumented Llama model Our findings provide new insights into LLM functioning, offering a statistical foundation for understanding their capabilities and limitations. This framework has implications for LLM design, training, and application, potentially guiding future developments in the field. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2402_03175 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Beyond the Black Box: A Statistical Model for LLM Reasoning and Inference Dalal, Siddhartha Misra, Vishal Machine Learning Artificial Intelligence I.2.7 This paper introduces a novel Bayesian learning model to explain the behavior of Large Language Models (LLMs), focusing on their core optimization metric of next token prediction. We develop a theoretical framework based on an ideal generative text model represented by a multinomial transition probability matrix with a prior, and examine how LLMs approximate this matrix. Key contributions include: (i) a continuity theorem relating embeddings to multinomial distributions, (ii) a demonstration that LLM text generation aligns with Bayesian learning principles, (iii) an explanation for the emergence of in-context learning in larger models, (iv) empirical validation using visualizations of next token probabilities from an instrumented Llama model Our findings provide new insights into LLM functioning, offering a statistical foundation for understanding their capabilities and limitations. This framework has implications for LLM design, training, and application, potentially guiding future developments in the field. |
| title | Beyond the Black Box: A Statistical Model for LLM Reasoning and Inference |
| topic | Machine Learning Artificial Intelligence I.2.7 |
| url | https://arxiv.org/abs/2402.03175 |