Saved in:
| Main Authors: | Raposo, David, Ritter, Sam, Richards, Blake, Lillicrap, Timothy, Humphreys, Peter Conway, Santoro, Adam |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.02258 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
by: Bae, Sangmin, et al.
Published: (2025)
by: Bae, Sangmin, et al.
Published: (2025)
Tracing the Representation Geometry of Language Models from Pretraining to Post-training
by: Li, Melody Zixuan, et al.
Published: (2025)
by: Li, Melody Zixuan, et al.
Published: (2025)
A path to natural language through tokenisation and transformers
by: Berman, David S., et al.
Published: (2026)
by: Berman, David S., et al.
Published: (2026)
Detecting out-of-distribution text using topological features of transformer-based language models
by: Pollano, Andres, et al.
Published: (2023)
by: Pollano, Andres, et al.
Published: (2023)
MoDification: Mixture of Depths Made Easy
by: Zhang, Chen, et al.
Published: (2024)
by: Zhang, Chen, et al.
Published: (2024)
Physical models realizing the transformer architecture of large language models
by: Chen, Zeqian
Published: (2025)
by: Chen, Zeqian
Published: (2025)
Zero-shot data citation function classification using transformer-based large language models (LLMs)
by: Byers, Neil, et al.
Published: (2025)
by: Byers, Neil, et al.
Published: (2025)
Comparison of different Unique hard attention transformer models by the formal languages they can recognize
by: Ryvkin, Leonid
Published: (2025)
by: Ryvkin, Leonid
Published: (2025)
Training Agents Inside of Scalable World Models
by: Hafner, Danijar, et al.
Published: (2025)
by: Hafner, Danijar, et al.
Published: (2025)
Alignment faking in large language models
by: Greenblatt, Ryan, et al.
Published: (2024)
by: Greenblatt, Ryan, et al.
Published: (2024)
How do language models learn facts? Dynamics, curricula and hallucinations
by: Zucchet, Nicolas, et al.
Published: (2025)
by: Zucchet, Nicolas, et al.
Published: (2025)
Question answering system of bridge design specification based on large language model
by: Zhang, Leye, et al.
Published: (2024)
by: Zhang, Leye, et al.
Published: (2024)
Depth-Recurrent Attention Mixtures: Giving Latent Reasoning the Attention it Deserves
by: Knupp, Jonas, et al.
Published: (2026)
by: Knupp, Jonas, et al.
Published: (2026)
Auditing language models for hidden objectives
by: Marks, Samuel, et al.
Published: (2025)
by: Marks, Samuel, et al.
Published: (2025)
Aligning language models with human preferences
by: Korbak, Tomasz
Published: (2024)
by: Korbak, Tomasz
Published: (2024)
Evaluating language models as risk scores
by: Cruz, André F., et al.
Published: (2024)
by: Cruz, André F., et al.
Published: (2024)
Exploring prompts to elicit memorization in masked language model-based named entity recognition
by: Xia, Yuxi, et al.
Published: (2024)
by: Xia, Yuxi, et al.
Published: (2024)
Attention based Bidirectional GRU hybrid model for inappropriate content detection in Urdu language
by: Shoukat, Ezzah, et al.
Published: (2025)
by: Shoukat, Ezzah, et al.
Published: (2025)
Mixture of Universal Experts: Scaling Virtual Width via Depth-Width Transformation
by: Chen, Yilong, et al.
Published: (2026)
by: Chen, Yilong, et al.
Published: (2026)
Dynamic layer selection in decoder-only transformers
by: Glavas, Theodore, et al.
Published: (2024)
by: Glavas, Theodore, et al.
Published: (2024)
Mastering Diverse Domains through World Models
by: Hafner, Danijar, et al.
Published: (2023)
by: Hafner, Danijar, et al.
Published: (2023)
A comparison of pipelines for the translation of a low resource language based on transformers
by: Bonfanti, Chiara, et al.
Published: (2025)
by: Bonfanti, Chiara, et al.
Published: (2025)
The language of time: a language model perspective on time-series foundation models
by: Xie, Yi, et al.
Published: (2025)
by: Xie, Yi, et al.
Published: (2025)
Amortizing intractable inference in large language models
by: Hu, Edward J., et al.
Published: (2023)
by: Hu, Edward J., et al.
Published: (2023)
Continuous-Depth Transformers with Learned Control Dynamics
by: Jemley, Peter
Published: (2026)
by: Jemley, Peter
Published: (2026)
A meta-analysis on the performance of machine-learning based language models for sentiment analysis
by: Rohde, Elena, et al.
Published: (2025)
by: Rohde, Elena, et al.
Published: (2025)
Perturbed examples reveal invariances shared by language models
by: Rawal, Ruchit, et al.
Published: (2023)
by: Rawal, Ruchit, et al.
Published: (2023)
A mean teacher algorithm for unlearning of language models
by: Klochkov, Yegor
Published: (2025)
by: Klochkov, Yegor
Published: (2025)
Do language models plan ahead for future tokens?
by: Wu, Wilson, et al.
Published: (2024)
by: Wu, Wilson, et al.
Published: (2024)
Visualizing token importance for black-box language models
by: Rauba, Paulius, et al.
Published: (2025)
by: Rauba, Paulius, et al.
Published: (2025)
Representation in large language models
by: Yetman, Cameron
Published: (2025)
by: Yetman, Cameron
Published: (2025)
Investigating and Alleviating Harm Amplification in LLM Interactions
by: Guo, Ruohao, et al.
Published: (2026)
by: Guo, Ruohao, et al.
Published: (2026)
Meta-Tuning LLMs to Leverage Lexical Knowledge for Generalizable Language Style Understanding
by: Guo, Ruohao, et al.
Published: (2023)
by: Guo, Ruohao, et al.
Published: (2023)
Prompt reinforcing for long-term planning of large language models
by: Lin, Hsien-Chin, et al.
Published: (2025)
by: Lin, Hsien-Chin, et al.
Published: (2025)
Machine-generated text detection prevents language model collapse
by: Drayson, George, et al.
Published: (2025)
by: Drayson, George, et al.
Published: (2025)
Fresh in memory: Training-order recency is linearly encoded in language model activations
by: Krasheninnikov, Dmitrii, et al.
Published: (2025)
by: Krasheninnikov, Dmitrii, et al.
Published: (2025)
Language Models can Self-Improve at State-Value Estimation for Better Search
by: Mendes, Ethan, et al.
Published: (2025)
by: Mendes, Ethan, et al.
Published: (2025)
No Need to Talk: Asynchronous Mixture of Language Models
by: Filippova, Anastasiia, et al.
Published: (2024)
by: Filippova, Anastasiia, et al.
Published: (2024)
Lightweight reranking for language model generations
by: Jain, Siddhartha, et al.
Published: (2023)
by: Jain, Siddhartha, et al.
Published: (2023)
Boosting classification reliability of NLP transformer models in the long run
by: Kmetty, Zoltán, et al.
Published: (2023)
by: Kmetty, Zoltán, et al.
Published: (2023)
Similar Items
-
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
by: Bae, Sangmin, et al.
Published: (2025) -
Tracing the Representation Geometry of Language Models from Pretraining to Post-training
by: Li, Melody Zixuan, et al.
Published: (2025) -
A path to natural language through tokenisation and transformers
by: Berman, David S., et al.
Published: (2026) -
Detecting out-of-distribution text using topological features of transformer-based language models
by: Pollano, Andres, et al.
Published: (2023) -
MoDification: Mixture of Depths Made Easy
by: Zhang, Chen, et al.
Published: (2024)