Saved in:
| Main Authors: | Singh, Aaditya K., Moskovitz, Ted, Dragutinovic, Sara, Hill, Felix, Chan, Stephanie C. Y., Saxe, Andrew M. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.05631 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
by: Singh, Aaditya K., et al.
Published: (2024)
by: Singh, Aaditya K., et al.
Published: (2024)
Softmax $\geq$ Linear: Transformers may learn to classify in-context by kernel gradient descent
by: Dragutinović, Sara, et al.
Published: (2025)
by: Dragutinović, Sara, et al.
Published: (2025)
Training Dynamics of In-Context Learning in Linear Attention
by: Zhang, Yedi, et al.
Published: (2025)
by: Zhang, Yedi, et al.
Published: (2025)
HARP: A challenging human-annotated math reasoning benchmark
by: Yue, Albert S., et al.
Published: (2024)
by: Yue, Albert S., et al.
Published: (2024)
Distinct Computations Emerge From Compositional Curricula in In-Context Learning
by: Lee, Jin Hwa, et al.
Published: (2025)
by: Lee, Jin Hwa, et al.
Published: (2025)
To Use or not to Use Muon: How Simplicity Bias in Optimizers Matters
by: Dragutinović, Sara, et al.
Published: (2026)
by: Dragutinović, Sara, et al.
Published: (2026)
The broader spectrum of in-context learning
by: Lampinen, Andrew Kyle, et al.
Published: (2024)
by: Lampinen, Andrew Kyle, et al.
Published: (2024)
Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures
by: Zhang, Yedi, et al.
Published: (2025)
by: Zhang, Yedi, et al.
Published: (2025)
Machine Learning-Augmented Optimization of Large Bilevel and Two-stage Stochastic Programs: Application to Cycling Network Design
by: Chan, Timothy C. Y., et al.
Published: (2022)
by: Chan, Timothy C. Y., et al.
Published: (2022)
Meta-Learning Strategies through Value Maximization in Neural Networks
by: Carrasco-Davis, Rodrigo, et al.
Published: (2023)
by: Carrasco-Davis, Rodrigo, et al.
Published: (2023)
When Representations Align: Universality in Representation Learning Dynamics
by: van Rossem, Loek, et al.
Published: (2024)
by: van Rossem, Loek, et al.
Published: (2024)
Algorithm Development in Neural Networks: Insights from the Streaming Parity Task
by: van Rossem, Loek, et al.
Published: (2025)
by: van Rossem, Loek, et al.
Published: (2025)
Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks
by: Jarvis, Devon, et al.
Published: (2025)
by: Jarvis, Devon, et al.
Published: (2025)
Nonlinear dynamics of localization in neural receptive fields
by: Lufkin, Leon, et al.
Published: (2025)
by: Lufkin, Leon, et al.
Published: (2025)
Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs
by: Singh, Aaditya K., et al.
Published: (2024)
by: Singh, Aaditya K., et al.
Published: (2024)
Learned feature representations are biased by complexity, learning order, position, and more
by: Lampinen, Andrew Kyle, et al.
Published: (2024)
by: Lampinen, Andrew Kyle, et al.
Published: (2024)
Understanding Unimodal Bias in Multimodal Deep Linear Networks
by: Zhang, Yedi, et al.
Published: (2023)
by: Zhang, Yedi, et al.
Published: (2023)
When Are Bias-Free ReLU Networks Effectively Linear Networks?
by: Zhang, Yedi, et al.
Published: (2024)
by: Zhang, Yedi, et al.
Published: (2024)
Bayes' Power for Explaining In-Context Learning Generalizations
by: Müller, Samuel, et al.
Published: (2024)
by: Müller, Samuel, et al.
Published: (2024)
CoopetitiveV: Leveraging LLM-powered Coopetitive Multi-Agent Prompting for High-quality Verilog Generation
by: Mi, Zhendong, et al.
Published: (2024)
by: Mi, Zhendong, et al.
Published: (2024)
Language models show human-like content effects on reasoning tasks
by: Dasgupta, Ishita, et al.
Published: (2022)
by: Dasgupta, Ishita, et al.
Published: (2022)
On The Specialization of Neural Modules
by: Jarvis, Devon, et al.
Published: (2024)
by: Jarvis, Devon, et al.
Published: (2024)
Learning by Self-Explaining
by: Stammer, Wolfgang, et al.
Published: (2023)
by: Stammer, Wolfgang, et al.
Published: (2023)
Early learning of the optimal constant solution in neural networks and humans
by: Rubruck, Jirko, et al.
Published: (2024)
by: Rubruck, Jirko, et al.
Published: (2024)
Logarithmic Neyman Regret for Adaptive Estimation of the Average Treatment Effect
by: Neopane, Ojash, et al.
Published: (2024)
by: Neopane, Ojash, et al.
Published: (2024)
The emergence of sparse attention: impact of data distribution and benefits of repetition
by: Zucchet, Nicolas, et al.
Published: (2025)
by: Zucchet, Nicolas, et al.
Published: (2025)
Optimistic Algorithms for Adaptive Estimation of the Average Treatment Effect
by: Neopane, Ojash, et al.
Published: (2025)
by: Neopane, Ojash, et al.
Published: (2025)
In-Context Learning Strategies Emerge Rationally
by: Wurgaft, Daniel, et al.
Published: (2025)
by: Wurgaft, Daniel, et al.
Published: (2025)
Optimal Learning Rate Schedule for Balancing Effort and Performance
by: Njaradi, Valentina, et al.
Published: (2026)
by: Njaradi, Valentina, et al.
Published: (2026)
Towards Provable Emergence of In-Context Reinforcement Learning
by: Wang, Jiuqi, et al.
Published: (2025)
by: Wang, Jiuqi, et al.
Published: (2025)
Revisiting the Role of Relearning in Semantic Dementia
by: Jarvis, Devon, et al.
Published: (2025)
by: Jarvis, Devon, et al.
Published: (2025)
Optimal Representation Size: High-Dimensional Analysis of Pretraining and Linear Probing
by: Njaradi, Valentina, et al.
Published: (2026)
by: Njaradi, Valentina, et al.
Published: (2026)
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
by: Dominé, Clémentine C. J., et al.
Published: (2024)
by: Dominé, Clémentine C. J., et al.
Published: (2024)
Representation biases: will we achieve complete understanding by analyzing representations?
by: Lampinen, Andrew Kyle, et al.
Published: (2025)
by: Lampinen, Andrew Kyle, et al.
Published: (2025)
Explaining Grokking and Information Bottleneck through Neural Collapse Emergence
by: Sakamoto, Keitaro, et al.
Published: (2025)
by: Sakamoto, Keitaro, et al.
Published: (2025)
The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions
by: Patel, Nishil, et al.
Published: (2023)
by: Patel, Nishil, et al.
Published: (2023)
Context and Diversity Matter: The Emergence of In-Context Learning in World Models
by: Wang, Fan, et al.
Published: (2025)
by: Wang, Fan, et al.
Published: (2025)
Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought
by: Xie, Zixuan, et al.
Published: (2026)
by: Xie, Zixuan, et al.
Published: (2026)
Emergence of In-Context Reinforcement Learning from Noise Distillation
by: Zisman, Ilya, et al.
Published: (2023)
by: Zisman, Ilya, et al.
Published: (2023)
Task Vectors in In-Context Learning: Emergence, Formation, and Benefit
by: Yang, Liu, et al.
Published: (2025)
by: Yang, Liu, et al.
Published: (2025)
Similar Items
-
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
by: Singh, Aaditya K., et al.
Published: (2024) -
Softmax $\geq$ Linear: Transformers may learn to classify in-context by kernel gradient descent
by: Dragutinović, Sara, et al.
Published: (2025) -
Training Dynamics of In-Context Learning in Linear Attention
by: Zhang, Yedi, et al.
Published: (2025) -
HARP: A challenging human-annotated math reasoning benchmark
by: Yue, Albert S., et al.
Published: (2024) -
Distinct Computations Emerge From Compositional Curricula in In-Context Learning
by: Lee, Jin Hwa, et al.
Published: (2025)