שמור ב:
| מחבר ראשי: | |
|---|---|
| פורמט: | Recurso digital |
| שפה: | אנגלית |
| יצא לאור: |
Zenodo
2026
|
| נושאים: | |
| גישה מקוונת: | https://doi.org/10.5281/zenodo.20073919 |
| תגים: |
הוספת תג
אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!
|
תוכן הענינים:
- <p>This position paper proposes that emergence in large language models is <br>not primarily a product of scale, but a product of linguistic diversity. <br>When a model learns a sufficiently diverse set of languages <br>simultaneously, the intersections and tensions between those languages <br>cross a threshold, and new representations appear that exist in no single <br>source language alone. This, the paper argues, is the mechanism of <br>emergence.</p> <p>The argument is developed across nine sections: a critique of the scale <br>hypothesis (Schaeffer 2023; Hoffmann 2022), the core hypothesis grounded <br>in the weak Sapir-Whorf framework, human evidence from the bilingual <br>brain literature with explicit engagement with the bilingual advantage <br>replication crisis (Lowe 2021; Paap 2022; Nichols 2020), recent <br>multilingual scaling research that revises the curse of multilinguality <br>(ATLAS — Longpre, Kudugunta, Muennighoff et al. 2025; Chuang et al. <br>2025), application to LLMs as the first entities to hold thousands of <br>linguistic frameworks simultaneously, and implications for AI safety <br>research.</p> <p>The paper is offered as a position paper inviting empirical scrutiny <br>rather than as a verified result. The methodology to causally establish <br>the relationship between linguistic intersection density and emergent <br>capabilities does not yet exist. Yet industry behavior (Meta's NLLB and <br>MMS), recent ATLAS results on positive cross-lingual transfer, and <br>human multilingualism research provide directional support.</p> <p>Keywords: large language models, emergent abilities, multilingualism, <br>linguistic diversity, scaling laws, Sapir-Whorf hypothesis, AI cognition, <br>AI safety</p>