:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Miller, Jack, Gleeson, Patrick, O'Neill, Charles, Bui, Thang, Levi, Noam
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2402.08946
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity
by: Miller, Jack, et al.
Published: (2023)

Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models
by: O'Neill, Charles, et al.
Published: (2024)

Grokking at the Edge of Linear Separability
by: Beck, Alon, et al.
Published: (2024)

Self-Attention as a Parametric Endofunctor: A Categorical Framework for Transformer Architectures
by: O'Neill, Charles
Published: (2025)

Grokking in Linear Estimators -- A Solvable Model that Groks without Understanding
by: Levi, Noam, et al.
Published: (2023)

Type 2 Tobit Sample Selection Models with Bayesian Additive Regression Trees
by: O'Neill, Eoghan
Published: (2025)

Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders
by: O'Neill, Charles, et al.
Published: (2024)

Resurrecting the Salmon: Rethinking Mechanistic Interpretability with Domain-Specific Sparse Autoencoders
by: O'Neill, Charles, et al.
Published: (2025)

Likelihood approximations via Gaussian approximate inference
by: Bui, Thang D.
Published: (2024)

A Simple Model of Inference Scaling Laws
by: Levi, Noam
Published: (2024)

Learning Shrinks the Hard Tail: Training-Dependent Inference Scaling in a Solvable Linear Model
by: Levi, Noam
Published: (2026)

Disentangling Dense Embeddings with Sparse Autoencoders
by: O'Neill, Charles, et al.
Published: (2024)

Sketching the Heat Kernel: Using Gaussian Processes to Embed Data
by: Gilbert, Anna C., et al.
Published: (2024)

Beyond ReinMax: Low-Variance Gradient Estimators for Discrete Latent Variables
by: Wang, Daniel, et al.
Published: (2026)

Modelling the Doughnut of social and planetary boundaries with frugal machine learning
by: Vrizzi, Stefano, et al.
Published: (2025)

From superposition to sparse codes: interpretable representations in neural networks
by: Klindt, David, et al.
Published: (2025)

To Grok Grokking: Provable Grokking in Ridge Regression
by: Xu, Mingyue, et al.
Published: (2026)

Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking
by: Gu, Zihan, et al.
Published: (2025)

Progress Measures for Grokking on Real-world Tasks
by: Golechha, Satvik
Published: (2024)

A Single Direction of Truth: An Observer Model's Linear Residual Probe Exposes and Steers Contextual Hallucinations
by: O'Neill, Charles, et al.
Published: (2025)

Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited
by: Bui, Thang D., et al.
Published: (2025)

CA-PCA: Manifold Dimension Estimation, Adapted for Curvature
by: Gilbert, Anna C., et al.
Published: (2023)

Low-Rank Key Value Attention
by: O'Neill, James, et al.
Published: (2026)

The Complexity Dynamics of Grokking
by: DeMoss, Branton, et al.
Published: (2024)

Bridging Lottery Ticket and Grokking: Understanding Grokking from Inner Structure of Networks
by: Minegishi, Gouki, et al.
Published: (2023)

The Implicit Bias of Logit Regularization
by: Beck, Alon, et al.
Published: (2026)

Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets
by: Cohen, Khen, et al.
Published: (2024)

Late-Stage Generalization Collapse in Grokking: Detecting anti-grokking with Weightwatcher
by: Prakash, Hari K, et al.
Published: (2026)

The Underlying Scaling Laws and Universal Statistical Structure of Complex Datasets
by: Levi, Noam, et al.
Published: (2023)

Grokking and Generalization Collapse: Insights from \texttt{HTSR} theory
by: Prakash, Hari K., et al.
Published: (2025)

Tighter sparse variational Gaussian processes
by: Bui, Thang D., et al.
Published: (2025)

Grokked Models are Better Unlearners
by: Liang, Yuanbang, et al.
Published: (2025)

Topological Signatures of Grokking
by: Tang, Yifan, et al.
Published: (2026)

Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition
by: Clauw, Kenzo, et al.
Published: (2024)

Pretraining Scaling Laws for Generative Evaluations of Language Models
by: Schaeffer, Rylan, et al.
Published: (2025)

Decoupled Weight Decay for Any $p$ Norm
by: Outmezguine, Nadav Joseph, et al.
Published: (2024)

Exploring Grokking: Experimental and Mechanistic Investigations
by: Qiye, Hu, et al.
Published: (2024)

ILDR: Geometric Early Detection of Grokking
by: Golwala, Shreel
Published: (2026)

Distributional Spectral Diagnostics for Localizing Grokking Transitions
by: Wang, Ziyue, et al.
Published: (2026)

GrokAlign: Geometric Characterisation and Acceleration of Grokking
by: Walker, Thomas, et al.
Published: (2025)