Saved in:
| Main Authors: | Prakash, Hari K., Martin, Charles H. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.04434 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Late-Stage Generalization Collapse in Grokking: Detecting anti-grokking with Weightwatcher
by: Prakash, Hari K, et al.
Published: (2026)
by: Prakash, Hari K, et al.
Published: (2026)
Detecting overfitting in Neural Networks during long-horizon grokking using Random Matrix Theory
by: Prakash, Hari K., et al.
Published: (2026)
by: Prakash, Hari K., et al.
Published: (2026)
Topological Signatures of Grokking
by: Tang, Yifan, et al.
Published: (2026)
by: Tang, Yifan, et al.
Published: (2026)
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation
by: Zhou, Xinyu, et al.
Published: (2025)
by: Zhou, Xinyu, et al.
Published: (2025)
Grokking Explained: A Statistical Phenomenon
by: Carvalho, Breno W., et al.
Published: (2025)
by: Carvalho, Breno W., et al.
Published: (2025)
Grokking in Linear Models for Logistic Regression
by: Das, Nataraj, et al.
Published: (2026)
by: Das, Nataraj, et al.
Published: (2026)
Controlling Grokking with Nonlinearity and Data Symmetry
by: Salah, Ahmed, et al.
Published: (2024)
by: Salah, Ahmed, et al.
Published: (2024)
Grokfast: Accelerated Grokking by Amplifying Slow Gradients
by: Lee, Jaerin, et al.
Published: (2024)
by: Lee, Jaerin, et al.
Published: (2024)
Understanding Grokking Through A Robustness Viewpoint
by: Tan, Zhiquan, et al.
Published: (2023)
by: Tan, Zhiquan, et al.
Published: (2023)
Progress Measures for Grokking on Real-world Tasks
by: Golechha, Satvik
Published: (2024)
by: Golechha, Satvik
Published: (2024)
Muon Optimizer Accelerates Grokking
by: Tveit, Amund, et al.
Published: (2025)
by: Tveit, Amund, et al.
Published: (2025)
Grokking Finite-Dimensional Algebra
by: Notsawo, Pascal Jr Tikeng, et al.
Published: (2026)
by: Notsawo, Pascal Jr Tikeng, et al.
Published: (2026)
Grokking Group Multiplication with Cosets
by: Stander, Dashiell, et al.
Published: (2023)
by: Stander, Dashiell, et al.
Published: (2023)
$\texttt{lrnnx}$: A library for Linear RNNs
by: Bania, Karan, et al.
Published: (2026)
by: Bania, Karan, et al.
Published: (2026)
The Norm-Separation Delay Law of Grokking: A First-Principles Theory of Delayed Generalization
by: Khanh, Truong Xuan, et al.
Published: (2026)
by: Khanh, Truong Xuan, et al.
Published: (2026)
Tracing the Path to Grokking: Embeddings, Dropout, and Network Activation
by: Salah, Ahmed, et al.
Published: (2025)
by: Salah, Ahmed, et al.
Published: (2025)
The Geometry of Grokking: Norm Minimization on the Zero-Loss Manifold
by: Musat, Tiberiu
Published: (2025)
by: Musat, Tiberiu
Published: (2025)
Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking
by: Xu, Yongzhong
Published: (2026)
by: Xu, Yongzhong
Published: (2026)
Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking
by: Tian, Yuandong
Published: (2025)
by: Tian, Yuandong
Published: (2025)
Early-Warning Signals of Grokking via Loss-Landscape Geometry
by: Xu, Yongzhong
Published: (2026)
by: Xu, Yongzhong
Published: (2026)
Grokking as Structural Inference: Transformers Need Bayesian Lottery Tickets
by: Hidajat, Kai, et al.
Published: (2026)
by: Hidajat, Kai, et al.
Published: (2026)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond
by: Jeffares, Alan, et al.
Published: (2024)
by: Jeffares, Alan, et al.
Published: (2024)
Grokking as a Falsifiable Finite-Size Transition
by: Bi, Yuda, et al.
Published: (2026)
by: Bi, Yuda, et al.
Published: (2026)
Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation
by: Park, Yeachan, et al.
Published: (2024)
by: Park, Yeachan, et al.
Published: (2024)
Grokking at the Edge of Numerical Stability
by: Prieto, Lucas, et al.
Published: (2025)
by: Prieto, Lucas, et al.
Published: (2025)
Reinforcement Learning Based Escape Route Generation in Low Visibility Environments
by: Srikanth, Hari
Published: (2024)
by: Srikanth, Hari
Published: (2024)
$\texttt{SEM-CTRL}$: Semantically Controlled Decoding
by: Albinhassan, Mohammad, et al.
Published: (2025)
by: Albinhassan, Mohammad, et al.
Published: (2025)
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
by: Lyu, Kaifeng, et al.
Published: (2023)
by: Lyu, Kaifeng, et al.
Published: (2023)
Two Speeds of Learning: A Representation-Readout Decomposition of Grokking and Double Descent
by: Chou, Chi-Ning, et al.
Published: (2026)
by: Chou, Chi-Ning, et al.
Published: (2026)
Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials
by: Furuta, Hiroki, et al.
Published: (2024)
by: Furuta, Hiroki, et al.
Published: (2024)
Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking
by: Xu, Yongzhong
Published: (2026)
by: Xu, Yongzhong
Published: (2026)
Grokking Beyond the Euclidean Norm of Model Parameters
by: Notsawo, Pascal Jr Tikeng, et al.
Published: (2025)
by: Notsawo, Pascal Jr Tikeng, et al.
Published: (2025)
$\texttt{SynC}$: Synergistic Boosting of Structure and Representation for Deep Graph Clustering
by: Ding, Shifei, et al.
Published: (2024)
by: Ding, Shifei, et al.
Published: (2024)
$\texttt{MiniMol}$: A Parameter-Efficient Foundation Model for Molecular Learning
by: Kläser, Kerstin, et al.
Published: (2024)
by: Kläser, Kerstin, et al.
Published: (2024)
Robust Weight Imprinting: Insights from Neural Collapse and Proxy-Based Aggregation
by: Westerhoff, Justus, et al.
Published: (2025)
by: Westerhoff, Justus, et al.
Published: (2025)
Critical Data Size of Language Models from a Grokking Perspective
by: Zhu, Xuekai, et al.
Published: (2024)
by: Zhu, Xuekai, et al.
Published: (2024)
The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure
by: Xu, Yongzhong
Published: (2026)
by: Xu, Yongzhong
Published: (2026)
Grokking as a Variance-Limited Phase Transition: Spectral Gating and the Epsilon-Stability Threshold
by: Acharya, Pratyush, et al.
Published: (2026)
by: Acharya, Pratyush, et al.
Published: (2026)
First-Passage Prediction of Grokking Delay: ACalibrated Law under AdamW with Causal Validation
by: Khanh, Truong Xuan, et al.
Published: (2026)
by: Khanh, Truong Xuan, et al.
Published: (2026)
$\texttt{AMEND++}$: Benchmarking Eligibility Criteria Amendments in Clinical Trials
by: Das, Trisha, et al.
Published: (2026)
by: Das, Trisha, et al.
Published: (2026)
Similar Items
-
Late-Stage Generalization Collapse in Grokking: Detecting anti-grokking with Weightwatcher
by: Prakash, Hari K, et al.
Published: (2026) -
Detecting overfitting in Neural Networks during long-horizon grokking using Random Matrix Theory
by: Prakash, Hari K., et al.
Published: (2026) -
Topological Signatures of Grokking
by: Tang, Yifan, et al.
Published: (2026) -
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation
by: Zhou, Xinyu, et al.
Published: (2025) -
Grokking Explained: A Statistical Phenomenon
by: Carvalho, Breno W., et al.
Published: (2025)