:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Singh, Aaditya K., Moskovitz, Ted, Dragutinovic, Sara, Hill, Felix, Chan, Stephanie C. Y., Saxe, Andrew M.
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2503.05631
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
by: Singh, Aaditya K., et al.
Published: (2024)

Softmax $\geq$ Linear: Transformers may learn to classify in-context by kernel gradient descent
by: Dragutinović, Sara, et al.
Published: (2025)

Training Dynamics of In-Context Learning in Linear Attention
by: Zhang, Yedi, et al.
Published: (2025)

HARP: A challenging human-annotated math reasoning benchmark
by: Yue, Albert S., et al.
Published: (2024)

Distinct Computations Emerge From Compositional Curricula in In-Context Learning
by: Lee, Jin Hwa, et al.
Published: (2025)

To Use or not to Use Muon: How Simplicity Bias in Optimizers Matters
by: Dragutinović, Sara, et al.
Published: (2026)

The broader spectrum of in-context learning
by: Lampinen, Andrew Kyle, et al.
Published: (2024)

Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures
by: Zhang, Yedi, et al.
Published: (2025)

Machine Learning-Augmented Optimization of Large Bilevel and Two-stage Stochastic Programs: Application to Cycling Network Design
by: Chan, Timothy C. Y., et al.
Published: (2022)

Meta-Learning Strategies through Value Maximization in Neural Networks
by: Carrasco-Davis, Rodrigo, et al.
Published: (2023)

When Representations Align: Universality in Representation Learning Dynamics
by: van Rossem, Loek, et al.
Published: (2024)

Algorithm Development in Neural Networks: Insights from the Streaming Parity Task
by: van Rossem, Loek, et al.
Published: (2025)

Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks
by: Jarvis, Devon, et al.
Published: (2025)

Nonlinear dynamics of localization in neural receptive fields
by: Lufkin, Leon, et al.
Published: (2025)

Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs
by: Singh, Aaditya K., et al.
Published: (2024)

Learned feature representations are biased by complexity, learning order, position, and more
by: Lampinen, Andrew Kyle, et al.
Published: (2024)

Understanding Unimodal Bias in Multimodal Deep Linear Networks
by: Zhang, Yedi, et al.
Published: (2023)

When Are Bias-Free ReLU Networks Effectively Linear Networks?
by: Zhang, Yedi, et al.
Published: (2024)

Bayes' Power for Explaining In-Context Learning Generalizations
by: Müller, Samuel, et al.
Published: (2024)

CoopetitiveV: Leveraging LLM-powered Coopetitive Multi-Agent Prompting for High-quality Verilog Generation
by: Mi, Zhendong, et al.
Published: (2024)

Language models show human-like content effects on reasoning tasks
by: Dasgupta, Ishita, et al.
Published: (2022)

On The Specialization of Neural Modules
by: Jarvis, Devon, et al.
Published: (2024)

Learning by Self-Explaining
by: Stammer, Wolfgang, et al.
Published: (2023)

Early learning of the optimal constant solution in neural networks and humans
by: Rubruck, Jirko, et al.
Published: (2024)

Logarithmic Neyman Regret for Adaptive Estimation of the Average Treatment Effect
by: Neopane, Ojash, et al.
Published: (2024)

The emergence of sparse attention: impact of data distribution and benefits of repetition
by: Zucchet, Nicolas, et al.
Published: (2025)

Optimistic Algorithms for Adaptive Estimation of the Average Treatment Effect
by: Neopane, Ojash, et al.
Published: (2025)

In-Context Learning Strategies Emerge Rationally
by: Wurgaft, Daniel, et al.
Published: (2025)

Optimal Learning Rate Schedule for Balancing Effort and Performance
by: Njaradi, Valentina, et al.
Published: (2026)

Towards Provable Emergence of In-Context Reinforcement Learning
by: Wang, Jiuqi, et al.
Published: (2025)

Revisiting the Role of Relearning in Semantic Dementia
by: Jarvis, Devon, et al.
Published: (2025)

Optimal Representation Size: High-Dimensional Analysis of Pretraining and Linear Probing
by: Njaradi, Valentina, et al.
Published: (2026)

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
by: Dominé, Clémentine C. J., et al.
Published: (2024)

Representation biases: will we achieve complete understanding by analyzing representations?
by: Lampinen, Andrew Kyle, et al.
Published: (2025)

Explaining Grokking and Information Bottleneck through Neural Collapse Emergence
by: Sakamoto, Keitaro, et al.
Published: (2025)

The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions
by: Patel, Nishil, et al.
Published: (2023)

Context and Diversity Matter: The Emergence of In-Context Learning in World Models
by: Wang, Fan, et al.
Published: (2025)

Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought
by: Xie, Zixuan, et al.
Published: (2026)

Emergence of In-Context Reinforcement Learning from Noise Distillation
by: Zisman, Ilya, et al.
Published: (2023)

Task Vectors in In-Context Learning: Emergence, Formation, and Benefit
by: Yang, Liu, et al.
Published: (2025)