Saved in:
| Main Authors: | Miller, Jack, Gleeson, Patrick, O'Neill, Charles, Bui, Thang, Levi, Noam |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.08946 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity
by: Miller, Jack, et al.
Published: (2023)
by: Miller, Jack, et al.
Published: (2023)
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models
by: O'Neill, Charles, et al.
Published: (2024)
by: O'Neill, Charles, et al.
Published: (2024)
Grokking at the Edge of Linear Separability
by: Beck, Alon, et al.
Published: (2024)
by: Beck, Alon, et al.
Published: (2024)
Self-Attention as a Parametric Endofunctor: A Categorical Framework for Transformer Architectures
by: O'Neill, Charles
Published: (2025)
by: O'Neill, Charles
Published: (2025)
Grokking in Linear Estimators -- A Solvable Model that Groks without Understanding
by: Levi, Noam, et al.
Published: (2023)
by: Levi, Noam, et al.
Published: (2023)
Type 2 Tobit Sample Selection Models with Bayesian Additive Regression Trees
by: O'Neill, Eoghan
Published: (2025)
by: O'Neill, Eoghan
Published: (2025)
Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders
by: O'Neill, Charles, et al.
Published: (2024)
by: O'Neill, Charles, et al.
Published: (2024)
Resurrecting the Salmon: Rethinking Mechanistic Interpretability with Domain-Specific Sparse Autoencoders
by: O'Neill, Charles, et al.
Published: (2025)
by: O'Neill, Charles, et al.
Published: (2025)
Likelihood approximations via Gaussian approximate inference
by: Bui, Thang D.
Published: (2024)
by: Bui, Thang D.
Published: (2024)
A Simple Model of Inference Scaling Laws
by: Levi, Noam
Published: (2024)
by: Levi, Noam
Published: (2024)
Learning Shrinks the Hard Tail: Training-Dependent Inference Scaling in a Solvable Linear Model
by: Levi, Noam
Published: (2026)
by: Levi, Noam
Published: (2026)
Disentangling Dense Embeddings with Sparse Autoencoders
by: O'Neill, Charles, et al.
Published: (2024)
by: O'Neill, Charles, et al.
Published: (2024)
Sketching the Heat Kernel: Using Gaussian Processes to Embed Data
by: Gilbert, Anna C., et al.
Published: (2024)
by: Gilbert, Anna C., et al.
Published: (2024)
Beyond ReinMax: Low-Variance Gradient Estimators for Discrete Latent Variables
by: Wang, Daniel, et al.
Published: (2026)
by: Wang, Daniel, et al.
Published: (2026)
Modelling the Doughnut of social and planetary boundaries with frugal machine learning
by: Vrizzi, Stefano, et al.
Published: (2025)
by: Vrizzi, Stefano, et al.
Published: (2025)
From superposition to sparse codes: interpretable representations in neural networks
by: Klindt, David, et al.
Published: (2025)
by: Klindt, David, et al.
Published: (2025)
To Grok Grokking: Provable Grokking in Ridge Regression
by: Xu, Mingyue, et al.
Published: (2026)
by: Xu, Mingyue, et al.
Published: (2026)
Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking
by: Gu, Zihan, et al.
Published: (2025)
by: Gu, Zihan, et al.
Published: (2025)
Progress Measures for Grokking on Real-world Tasks
by: Golechha, Satvik
Published: (2024)
by: Golechha, Satvik
Published: (2024)
A Single Direction of Truth: An Observer Model's Linear Residual Probe Exposes and Steers Contextual Hallucinations
by: O'Neill, Charles, et al.
Published: (2025)
by: O'Neill, Charles, et al.
Published: (2025)
Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited
by: Bui, Thang D., et al.
Published: (2025)
by: Bui, Thang D., et al.
Published: (2025)
CA-PCA: Manifold Dimension Estimation, Adapted for Curvature
by: Gilbert, Anna C., et al.
Published: (2023)
by: Gilbert, Anna C., et al.
Published: (2023)
Low-Rank Key Value Attention
by: O'Neill, James, et al.
Published: (2026)
by: O'Neill, James, et al.
Published: (2026)
The Complexity Dynamics of Grokking
by: DeMoss, Branton, et al.
Published: (2024)
by: DeMoss, Branton, et al.
Published: (2024)
Bridging Lottery Ticket and Grokking: Understanding Grokking from Inner Structure of Networks
by: Minegishi, Gouki, et al.
Published: (2023)
by: Minegishi, Gouki, et al.
Published: (2023)
The Implicit Bias of Logit Regularization
by: Beck, Alon, et al.
Published: (2026)
by: Beck, Alon, et al.
Published: (2026)
Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets
by: Cohen, Khen, et al.
Published: (2024)
by: Cohen, Khen, et al.
Published: (2024)
Late-Stage Generalization Collapse in Grokking: Detecting anti-grokking with Weightwatcher
by: Prakash, Hari K, et al.
Published: (2026)
by: Prakash, Hari K, et al.
Published: (2026)
The Underlying Scaling Laws and Universal Statistical Structure of Complex Datasets
by: Levi, Noam, et al.
Published: (2023)
by: Levi, Noam, et al.
Published: (2023)
Grokking and Generalization Collapse: Insights from \texttt{HTSR} theory
by: Prakash, Hari K., et al.
Published: (2025)
by: Prakash, Hari K., et al.
Published: (2025)
Tighter sparse variational Gaussian processes
by: Bui, Thang D., et al.
Published: (2025)
by: Bui, Thang D., et al.
Published: (2025)
Grokked Models are Better Unlearners
by: Liang, Yuanbang, et al.
Published: (2025)
by: Liang, Yuanbang, et al.
Published: (2025)
Topological Signatures of Grokking
by: Tang, Yifan, et al.
Published: (2026)
by: Tang, Yifan, et al.
Published: (2026)
Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition
by: Clauw, Kenzo, et al.
Published: (2024)
by: Clauw, Kenzo, et al.
Published: (2024)
Pretraining Scaling Laws for Generative Evaluations of Language Models
by: Schaeffer, Rylan, et al.
Published: (2025)
by: Schaeffer, Rylan, et al.
Published: (2025)
Decoupled Weight Decay for Any $p$ Norm
by: Outmezguine, Nadav Joseph, et al.
Published: (2024)
by: Outmezguine, Nadav Joseph, et al.
Published: (2024)
Exploring Grokking: Experimental and Mechanistic Investigations
by: Qiye, Hu, et al.
Published: (2024)
by: Qiye, Hu, et al.
Published: (2024)
ILDR: Geometric Early Detection of Grokking
by: Golwala, Shreel
Published: (2026)
by: Golwala, Shreel
Published: (2026)
Distributional Spectral Diagnostics for Localizing Grokking Transitions
by: Wang, Ziyue, et al.
Published: (2026)
by: Wang, Ziyue, et al.
Published: (2026)
GrokAlign: Geometric Characterisation and Acceleration of Grokking
by: Walker, Thomas, et al.
Published: (2025)
by: Walker, Thomas, et al.
Published: (2025)
Similar Items
-
Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity
by: Miller, Jack, et al.
Published: (2023) -
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models
by: O'Neill, Charles, et al.
Published: (2024) -
Grokking at the Edge of Linear Separability
by: Beck, Alon, et al.
Published: (2024) -
Self-Attention as a Parametric Endofunctor: A Categorical Framework for Transformer Architectures
by: O'Neill, Charles
Published: (2025) -
Grokking in Linear Estimators -- A Solvable Model that Groks without Understanding
by: Levi, Noam, et al.
Published: (2023)