Saved in:
| Main Authors: | Michaelov, James A., Bergen, Benjamin K. |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2305.14681 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics
by: Michaelov, James A., et al.
Published: (2024)
by: Michaelov, James A., et al.
Published: (2024)
Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale
by: Michaelov, James A., et al.
Published: (2025)
by: Michaelov, James A., et al.
Published: (2025)
Not quite Sherlock Holmes: Language model predictions do not reliably differentiate impossible from improbable events
by: Michaelov, James A., et al.
Published: (2025)
by: Michaelov, James A., et al.
Published: (2025)
On the Acquisition of Shared Grammatical Representations in Bilingual Language Models
by: Arnett, Catherine, et al.
Published: (2025)
by: Arnett, Catherine, et al.
Published: (2025)
Disaggregation Reveals Hidden Training Dynamics: The Case of Agreement Attraction
by: Michaelov, James A., et al.
Published: (2025)
by: Michaelov, James A., et al.
Published: (2025)
N-gram-like Language Models Predict Reading Time Best
by: Michaelov, James A., et al.
Published: (2026)
by: Michaelov, James A., et al.
Published: (2026)
Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models
by: Chang, Tyler A., et al.
Published: (2025)
by: Chang, Tyler A., et al.
Published: (2025)
Why do language models perform worse for morphologically complex languages?
by: Arnett, Catherine, et al.
Published: (2024)
by: Arnett, Catherine, et al.
Published: (2024)
How Open Must Language Models be to Enable Reliable Scientific Inference?
by: Michaelov, James A., et al.
Published: (2026)
by: Michaelov, James A., et al.
Published: (2026)
Do Large Language Models Exhibit Spontaneous Rational Deception?
by: Taylor, Samuel M., et al.
Published: (2025)
by: Taylor, Samuel M., et al.
Published: (2025)
Large Language Models Pass the Turing Test
by: Jones, Cameron R., et al.
Published: (2025)
by: Jones, Cameron R., et al.
Published: (2025)
Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs
by: Trott, Sean, et al.
Published: (2026)
by: Trott, Sean, et al.
Published: (2026)
Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability
by: Chang, Tyler A., et al.
Published: (2023)
by: Chang, Tyler A., et al.
Published: (2023)
A Bit of a Problem: Measurement Disparities in Dataset Sizes Across Languages
by: Arnett, Catherine, et al.
Published: (2024)
by: Arnett, Catherine, et al.
Published: (2024)
EVOKE: Emotion Vocabulary Of Korean and English
by: Jung, Yoonwon, et al.
Published: (2026)
by: Jung, Yoonwon, et al.
Published: (2026)
Does GPT-4 pass the Turing test?
by: Jones, Cameron R., et al.
Published: (2023)
by: Jones, Cameron R., et al.
Published: (2023)
Synthetic bootstrapped pretraining
by: Yang, Zitong, et al.
Published: (2025)
by: Yang, Zitong, et al.
Published: (2025)
Goldfish: Monolingual Language Models for 350 Languages
by: Chang, Tyler A., et al.
Published: (2024)
by: Chang, Tyler A., et al.
Published: (2024)
Explaining and Mitigating Crosslingual Tokenizer Inequities
by: Arnett, Catherine, et al.
Published: (2025)
by: Arnett, Catherine, et al.
Published: (2025)
Synthetic continued pretraining
by: Yang, Zitong, et al.
Published: (2024)
by: Yang, Zitong, et al.
Published: (2024)
Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at Trivial Alterations to the False Belief Task?
by: Pi, Zhiqiang, et al.
Published: (2024)
by: Pi, Zhiqiang, et al.
Published: (2024)
GPT-4 is judged more human than humans in displaced and inverted Turing tests
by: Rathi, Ishika, et al.
Published: (2024)
by: Rathi, Ishika, et al.
Published: (2024)
Lies, Damned Lies, and Distributional Language Statistics: Persuasion and Deception with Large Language Models
by: Jones, Cameron R., et al.
Published: (2024)
by: Jones, Cameron R., et al.
Published: (2024)
The Book of Life approach: Enabling richness and scale for life course research
by: Verhagen, Mark D., et al.
Published: (2025)
by: Verhagen, Mark D., et al.
Published: (2025)
Prompting from the bench: Large-scale pretraining is not sufficient to prepare LLMs for ordinary meaning analysis
by: Purushothama, Abhishek, et al.
Published: (2025)
by: Purushothama, Abhishek, et al.
Published: (2025)
TextGram: Towards a better domain-adaptive pretraining
by: Hiwarkhedkar, Sharayu, et al.
Published: (2024)
by: Hiwarkhedkar, Sharayu, et al.
Published: (2024)
Do pretrained Transformers Learn In-Context by Gradient Descent?
by: Shen, Lingfeng, et al.
Published: (2023)
by: Shen, Lingfeng, et al.
Published: (2023)
Analyzing the relationships between pretraining language, phonetic, tonal, and speaker information in self-supervised speech models
by: Gubian, Michele, et al.
Published: (2025)
by: Gubian, Michele, et al.
Published: (2025)
LLMs and people both learn to form conventions -- just not with each other
by: Jones, Cameron R., et al.
Published: (2026)
by: Jones, Cameron R., et al.
Published: (2026)
How far can bias go? Tracing bias from pretraining data to alignment
by: Thaler, Marion, et al.
Published: (2024)
by: Thaler, Marion, et al.
Published: (2024)
BPDec: Unveiling the Potential of Masked Language Modeling Decoder in BERT pretraining
by: Liang, Wen, et al.
Published: (2024)
by: Liang, Wen, et al.
Published: (2024)
MedicalBERT: enhancing biomedical natural language processing using pretrained BERT-based model
by: Reddy, K. Sahit, et al.
Published: (2025)
by: Reddy, K. Sahit, et al.
Published: (2025)
Proving membership in LLM pretraining data via data watermarks
by: Wei, Johnny Tian-Zheng, et al.
Published: (2024)
by: Wei, Johnny Tian-Zheng, et al.
Published: (2024)
Tgea: An error-annotated dataset and benchmark tasks for text generation from pretrained language models
by: He, Jie, et al.
Published: (2025)
by: He, Jie, et al.
Published: (2025)
Empirical study of pretrained multilingual language models for zero-shot cross-lingual knowledge transfer in generation
by: Chirkova, Nadezhda, et al.
Published: (2023)
by: Chirkova, Nadezhda, et al.
Published: (2023)
The effectiveness of MAE pre-pretraining for billion-scale pretraining
by: Singh, Mannat, et al.
Published: (2023)
by: Singh, Mannat, et al.
Published: (2023)
GLAP: General contrastive audio-text pretraining across domains and languages
by: Dinkel, Heinrich, et al.
Published: (2025)
by: Dinkel, Heinrich, et al.
Published: (2025)
Emergent effects of scaling on the functional hierarchies within large language models
by: Bogdan, Paul C.
Published: (2025)
by: Bogdan, Paul C.
Published: (2025)
ks-pret-5m: a 5 million word, 12 million token kashmiri pretraining dataset
by: Malik, Haq Nawaz, et al.
Published: (2026)
by: Malik, Haq Nawaz, et al.
Published: (2026)
[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus
by: Choshen, Leshem, et al.
Published: (2024)
by: Choshen, Leshem, et al.
Published: (2024)
Similar Items
-
Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics
by: Michaelov, James A., et al.
Published: (2024) -
Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale
by: Michaelov, James A., et al.
Published: (2025) -
Not quite Sherlock Holmes: Language model predictions do not reliably differentiate impossible from improbable events
by: Michaelov, James A., et al.
Published: (2025) -
On the Acquisition of Shared Grammatical Representations in Bilingual Language Models
by: Arnett, Catherine, et al.
Published: (2025) -
Disaggregation Reveals Hidden Training Dynamics: The Case of Agreement Attraction
by: Michaelov, James A., et al.
Published: (2025)