:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zucchet, Nicolas, Bornschein, Jörg, Chan, Stephanie, Lampinen, Andrew, Pascanu, Razvan, De, Soham
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2503.21676
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

On the generalization of language models from in-context learning and finetuning: a controlled study
by: Lampinen, Andrew K., et al.
Published: (2025)

Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models
by: Rannen-Triki, Amal, et al.
Published: (2024)

The emergence of sparse attention: impact of data distribution and benefits of repetition
by: Zucchet, Nicolas, et al.
Published: (2025)

The Illusion of Stochasticity in LLMs
by: Gu, Xiangming, et al.
Published: (2026)

The broader spectrum of in-context learning
by: Lampinen, Andrew Kyle, et al.
Published: (2024)

Linear representations in language models can change dramatically over a conversation
by: Lampinen, Andrew Kyle, et al.
Published: (2026)

The in-context inductive biases of vision-language models differ across modalities
by: Allen, Kelsey, et al.
Published: (2025)

Transformers need glasses! Information over-squashing in language tasks
by: Barbero, Federico, et al.
Published: (2024)

LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
by: Schmied, Thomas, et al.
Published: (2025)

Round and Round We Go! What makes Rotary Positional Encodings useful?
by: Barbero, Federico, et al.
Published: (2024)

Perplexity Cannot Always Tell Right from Wrong
by: Veličković, Petar, et al.
Published: (2026)

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues
by: Orvieto, Antonio, et al.
Published: (2023)

Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences
by: Lampinen, Andrew Kyle, et al.
Published: (2025)

Learned feature representations are biased by complexity, learning order, position, and more
by: Lampinen, Andrew Kyle, et al.
Published: (2024)

Transformers meet Neural Algorithmic Reasoners
by: Bounsi, Wilfried, et al.
Published: (2024)

Just-in-time and distributed task representations in language models
by: Li, Yuxuan, et al.
Published: (2025)

Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models
by: Gu, Xiangming, et al.
Published: (2026)

Fine-Tuned In-Context Learners for Efficient Adaptation
by: Bornschein, Jorg, et al.
Published: (2025)

Language models show human-like content effects on reasoning tasks
by: Dasgupta, Ishita, et al.
Published: (2022)

Filter Equivariant Functions: A symmetric account of length-general extrapolation on lists
by: Lewis, Owen, et al.
Published: (2025)

Interpretability Illusions in the Generalization of Simplified Models
by: Friedman, Dan, et al.
Published: (2023)

Kalman Filter for Online Classification of Non-Stationary Data
by: Titsias, Michalis K., et al.
Published: (2023)

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
by: De, Soham, et al.
Published: (2024)

Distinct Computations Emerge From Compositional Curricula in In-Context Learning
by: Lee, Jin Hwa, et al.
Published: (2025)

Survey on reinforcement learning for language processing
by: Uc-Cetina, Victor, et al.
Published: (2021)

Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency
by: Goel, Aman, et al.
Published: (2025)

The representation landscape of few-shot learning and fine-tuning in large language models
by: Doimo, Diego, et al.
Published: (2024)

Meta-learning how to Share Credit among Macro-Actions
by: Hosu, Ionel-Alexandru, et al.
Published: (2025)

An evolutionary perspective on modes of learning in Transformers
by: Ku, Alexander Y., et al.
Published: (2025)

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
by: Raposo, David, et al.
Published: (2024)

Enhancing ASD detection accuracy: a combined approach of machine learning and deep learning models with natural language processing
by: Rubio-Martín, Sergio, et al.
Published: (2024)

Reducing hallucination in structured outputs via Retrieval-Augmented Generation
by: Béchard, Patrice, et al.
Published: (2024)

Large language models reorganize representational geometry during in-context learning
by: Xiong, Hua-Dong, et al.
Published: (2026)

A meta-analysis on the performance of machine-learning based language models for sentiment analysis
by: Rohde, Elena, et al.
Published: (2025)

Recurrent neural networks: vanishing and exploding gradients are not the end of the story
by: Zucchet, Nicolas, et al.
Published: (2024)

Evaluating Representations with Readout Model Switching
by: Li, Yazhe, et al.
Published: (2023)

Denoising Autoregressive Representation Learning
by: Li, Yazhe, et al.
Published: (2024)

DevBench: A multimodal developmental benchmark for language learning
by: Tan, Alvin Wei Ming, et al.
Published: (2024)

Perturbation: A simple and efficient adversarial tracer for representation learning in language models
by: Rozner, Joshua, et al.
Published: (2026)

Aligning language models with human preferences
by: Korbak, Tomasz
Published: (2024)