:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Raposo, David, Ritter, Sam, Richards, Blake, Lillicrap, Timothy, Humphreys, Peter Conway, Santoro, Adam
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2404.02258
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
by: Bae, Sangmin, et al.
Published: (2025)

Tracing the Representation Geometry of Language Models from Pretraining to Post-training
by: Li, Melody Zixuan, et al.
Published: (2025)

A path to natural language through tokenisation and transformers
by: Berman, David S., et al.
Published: (2026)

Detecting out-of-distribution text using topological features of transformer-based language models
by: Pollano, Andres, et al.
Published: (2023)

MoDification: Mixture of Depths Made Easy
by: Zhang, Chen, et al.
Published: (2024)

Physical models realizing the transformer architecture of large language models
by: Chen, Zeqian
Published: (2025)

Zero-shot data citation function classification using transformer-based large language models (LLMs)
by: Byers, Neil, et al.
Published: (2025)

Comparison of different Unique hard attention transformer models by the formal languages they can recognize
by: Ryvkin, Leonid
Published: (2025)

Training Agents Inside of Scalable World Models
by: Hafner, Danijar, et al.
Published: (2025)

Alignment faking in large language models
by: Greenblatt, Ryan, et al.
Published: (2024)

How do language models learn facts? Dynamics, curricula and hallucinations
by: Zucchet, Nicolas, et al.
Published: (2025)

Question answering system of bridge design specification based on large language model
by: Zhang, Leye, et al.
Published: (2024)

Depth-Recurrent Attention Mixtures: Giving Latent Reasoning the Attention it Deserves
by: Knupp, Jonas, et al.
Published: (2026)

Auditing language models for hidden objectives
by: Marks, Samuel, et al.
Published: (2025)

Aligning language models with human preferences
by: Korbak, Tomasz
Published: (2024)

Evaluating language models as risk scores
by: Cruz, André F., et al.
Published: (2024)

Exploring prompts to elicit memorization in masked language model-based named entity recognition
by: Xia, Yuxi, et al.
Published: (2024)

Attention based Bidirectional GRU hybrid model for inappropriate content detection in Urdu language
by: Shoukat, Ezzah, et al.
Published: (2025)

Mixture of Universal Experts: Scaling Virtual Width via Depth-Width Transformation
by: Chen, Yilong, et al.
Published: (2026)

Dynamic layer selection in decoder-only transformers
by: Glavas, Theodore, et al.
Published: (2024)

Mastering Diverse Domains through World Models
by: Hafner, Danijar, et al.
Published: (2023)

A comparison of pipelines for the translation of a low resource language based on transformers
by: Bonfanti, Chiara, et al.
Published: (2025)

The language of time: a language model perspective on time-series foundation models
by: Xie, Yi, et al.
Published: (2025)

Amortizing intractable inference in large language models
by: Hu, Edward J., et al.
Published: (2023)

Continuous-Depth Transformers with Learned Control Dynamics
by: Jemley, Peter
Published: (2026)

A meta-analysis on the performance of machine-learning based language models for sentiment analysis
by: Rohde, Elena, et al.
Published: (2025)

Perturbed examples reveal invariances shared by language models
by: Rawal, Ruchit, et al.
Published: (2023)

A mean teacher algorithm for unlearning of language models
by: Klochkov, Yegor
Published: (2025)

Do language models plan ahead for future tokens?
by: Wu, Wilson, et al.
Published: (2024)

Visualizing token importance for black-box language models
by: Rauba, Paulius, et al.
Published: (2025)

Representation in large language models
by: Yetman, Cameron
Published: (2025)

Investigating and Alleviating Harm Amplification in LLM Interactions
by: Guo, Ruohao, et al.
Published: (2026)

Meta-Tuning LLMs to Leverage Lexical Knowledge for Generalizable Language Style Understanding
by: Guo, Ruohao, et al.
Published: (2023)

Prompt reinforcing for long-term planning of large language models
by: Lin, Hsien-Chin, et al.
Published: (2025)

Machine-generated text detection prevents language model collapse
by: Drayson, George, et al.
Published: (2025)

Fresh in memory: Training-order recency is linearly encoded in language model activations
by: Krasheninnikov, Dmitrii, et al.
Published: (2025)

Language Models can Self-Improve at State-Value Estimation for Better Search
by: Mendes, Ethan, et al.
Published: (2025)

No Need to Talk: Asynchronous Mixture of Language Models
by: Filippova, Anastasiia, et al.
Published: (2024)

Lightweight reranking for language model generations
by: Jain, Siddhartha, et al.
Published: (2023)

Boosting classification reliability of NLP transformer models in the long run
by: Kmetty, Zoltán, et al.
Published: (2023)