Saved in:
| Main Authors: | Zucchet, Nicolas, Bornschein, Jörg, Chan, Stephanie, Lampinen, Andrew, Pascanu, Razvan, De, Soham |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.21676 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
On the generalization of language models from in-context learning and finetuning: a controlled study
by: Lampinen, Andrew K., et al.
Published: (2025)
by: Lampinen, Andrew K., et al.
Published: (2025)
Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models
by: Rannen-Triki, Amal, et al.
Published: (2024)
by: Rannen-Triki, Amal, et al.
Published: (2024)
The emergence of sparse attention: impact of data distribution and benefits of repetition
by: Zucchet, Nicolas, et al.
Published: (2025)
by: Zucchet, Nicolas, et al.
Published: (2025)
The Illusion of Stochasticity in LLMs
by: Gu, Xiangming, et al.
Published: (2026)
by: Gu, Xiangming, et al.
Published: (2026)
The broader spectrum of in-context learning
by: Lampinen, Andrew Kyle, et al.
Published: (2024)
by: Lampinen, Andrew Kyle, et al.
Published: (2024)
Linear representations in language models can change dramatically over a conversation
by: Lampinen, Andrew Kyle, et al.
Published: (2026)
by: Lampinen, Andrew Kyle, et al.
Published: (2026)
The in-context inductive biases of vision-language models differ across modalities
by: Allen, Kelsey, et al.
Published: (2025)
by: Allen, Kelsey, et al.
Published: (2025)
Transformers need glasses! Information over-squashing in language tasks
by: Barbero, Federico, et al.
Published: (2024)
by: Barbero, Federico, et al.
Published: (2024)
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
by: Schmied, Thomas, et al.
Published: (2025)
by: Schmied, Thomas, et al.
Published: (2025)
Round and Round We Go! What makes Rotary Positional Encodings useful?
by: Barbero, Federico, et al.
Published: (2024)
by: Barbero, Federico, et al.
Published: (2024)
Perplexity Cannot Always Tell Right from Wrong
by: Veličković, Petar, et al.
Published: (2026)
by: Veličković, Petar, et al.
Published: (2026)
Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues
by: Orvieto, Antonio, et al.
Published: (2023)
by: Orvieto, Antonio, et al.
Published: (2023)
Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences
by: Lampinen, Andrew Kyle, et al.
Published: (2025)
by: Lampinen, Andrew Kyle, et al.
Published: (2025)
Learned feature representations are biased by complexity, learning order, position, and more
by: Lampinen, Andrew Kyle, et al.
Published: (2024)
by: Lampinen, Andrew Kyle, et al.
Published: (2024)
Transformers meet Neural Algorithmic Reasoners
by: Bounsi, Wilfried, et al.
Published: (2024)
by: Bounsi, Wilfried, et al.
Published: (2024)
Just-in-time and distributed task representations in language models
by: Li, Yuxuan, et al.
Published: (2025)
by: Li, Yuxuan, et al.
Published: (2025)
Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models
by: Gu, Xiangming, et al.
Published: (2026)
by: Gu, Xiangming, et al.
Published: (2026)
Fine-Tuned In-Context Learners for Efficient Adaptation
by: Bornschein, Jorg, et al.
Published: (2025)
by: Bornschein, Jorg, et al.
Published: (2025)
Language models show human-like content effects on reasoning tasks
by: Dasgupta, Ishita, et al.
Published: (2022)
by: Dasgupta, Ishita, et al.
Published: (2022)
Filter Equivariant Functions: A symmetric account of length-general extrapolation on lists
by: Lewis, Owen, et al.
Published: (2025)
by: Lewis, Owen, et al.
Published: (2025)
Interpretability Illusions in the Generalization of Simplified Models
by: Friedman, Dan, et al.
Published: (2023)
by: Friedman, Dan, et al.
Published: (2023)
Kalman Filter for Online Classification of Non-Stationary Data
by: Titsias, Michalis K., et al.
Published: (2023)
by: Titsias, Michalis K., et al.
Published: (2023)
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
by: De, Soham, et al.
Published: (2024)
by: De, Soham, et al.
Published: (2024)
Distinct Computations Emerge From Compositional Curricula in In-Context Learning
by: Lee, Jin Hwa, et al.
Published: (2025)
by: Lee, Jin Hwa, et al.
Published: (2025)
Survey on reinforcement learning for language processing
by: Uc-Cetina, Victor, et al.
Published: (2021)
by: Uc-Cetina, Victor, et al.
Published: (2021)
Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency
by: Goel, Aman, et al.
Published: (2025)
by: Goel, Aman, et al.
Published: (2025)
The representation landscape of few-shot learning and fine-tuning in large language models
by: Doimo, Diego, et al.
Published: (2024)
by: Doimo, Diego, et al.
Published: (2024)
Meta-learning how to Share Credit among Macro-Actions
by: Hosu, Ionel-Alexandru, et al.
Published: (2025)
by: Hosu, Ionel-Alexandru, et al.
Published: (2025)
An evolutionary perspective on modes of learning in Transformers
by: Ku, Alexander Y., et al.
Published: (2025)
by: Ku, Alexander Y., et al.
Published: (2025)
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
by: Raposo, David, et al.
Published: (2024)
by: Raposo, David, et al.
Published: (2024)
Enhancing ASD detection accuracy: a combined approach of machine learning and deep learning models with natural language processing
by: Rubio-Martín, Sergio, et al.
Published: (2024)
by: Rubio-Martín, Sergio, et al.
Published: (2024)
Reducing hallucination in structured outputs via Retrieval-Augmented Generation
by: Béchard, Patrice, et al.
Published: (2024)
by: Béchard, Patrice, et al.
Published: (2024)
Large language models reorganize representational geometry during in-context learning
by: Xiong, Hua-Dong, et al.
Published: (2026)
by: Xiong, Hua-Dong, et al.
Published: (2026)
A meta-analysis on the performance of machine-learning based language models for sentiment analysis
by: Rohde, Elena, et al.
Published: (2025)
by: Rohde, Elena, et al.
Published: (2025)
Recurrent neural networks: vanishing and exploding gradients are not the end of the story
by: Zucchet, Nicolas, et al.
Published: (2024)
by: Zucchet, Nicolas, et al.
Published: (2024)
Evaluating Representations with Readout Model Switching
by: Li, Yazhe, et al.
Published: (2023)
by: Li, Yazhe, et al.
Published: (2023)
Denoising Autoregressive Representation Learning
by: Li, Yazhe, et al.
Published: (2024)
by: Li, Yazhe, et al.
Published: (2024)
DevBench: A multimodal developmental benchmark for language learning
by: Tan, Alvin Wei Ming, et al.
Published: (2024)
by: Tan, Alvin Wei Ming, et al.
Published: (2024)
Perturbation: A simple and efficient adversarial tracer for representation learning in language models
by: Rozner, Joshua, et al.
Published: (2026)
by: Rozner, Joshua, et al.
Published: (2026)
Aligning language models with human preferences
by: Korbak, Tomasz
Published: (2024)
by: Korbak, Tomasz
Published: (2024)
Similar Items
-
On the generalization of language models from in-context learning and finetuning: a controlled study
by: Lampinen, Andrew K., et al.
Published: (2025) -
Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models
by: Rannen-Triki, Amal, et al.
Published: (2024) -
The emergence of sparse attention: impact of data distribution and benefits of repetition
by: Zucchet, Nicolas, et al.
Published: (2025) -
The Illusion of Stochasticity in LLMs
by: Gu, Xiangming, et al.
Published: (2026) -
The broader spectrum of in-context learning
by: Lampinen, Andrew Kyle, et al.
Published: (2024)