Saved in:
| Main Authors: | Kim, Joon Hyeok, Park, Yong-Hyun, Østby, Mattis Dalsætra, Gu, Jiatao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.17673 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking
by: He, Jianliang, et al.
Published: (2026)
by: He, Jianliang, et al.
Published: (2026)
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition
by: Mohamadi, Mohamad Amin, et al.
Published: (2024)
by: Mohamadi, Mohamad Amin, et al.
Published: (2024)
Grokking Modular Polynomials
by: Doshi, Darshil, et al.
Published: (2024)
by: Doshi, Darshil, et al.
Published: (2024)
Survival of the Fittest Representation: A Case Study with Modular Addition
by: Ding, Xiaoman Delores, et al.
Published: (2024)
by: Ding, Xiaoman Delores, et al.
Published: (2024)
Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation
by: Park, Yeachan, et al.
Published: (2024)
by: Park, Yeachan, et al.
Published: (2024)
Matryoshka Diffusion Models
by: Gu, Jiatao, et al.
Published: (2023)
by: Gu, Jiatao, et al.
Published: (2023)
Latent Algorithmic Structure Precedes Grokking: A Mechanistic Study of ReLU MLPs on Modular Arithmetic
by: Swaroop, Anand
Published: (2026)
by: Swaroop, Anand
Published: (2026)
To Grok Grokking: Provable Grokking in Ridge Regression
by: Xu, Mingyue, et al.
Published: (2026)
by: Xu, Mingyue, et al.
Published: (2026)
Grokked Models are Better Unlearners
by: Liang, Yuanbang, et al.
Published: (2025)
by: Liang, Yuanbang, et al.
Published: (2025)
Resolution Chromatography of Diffusion Models
by: Hwang, Juno, et al.
Published: (2023)
by: Hwang, Juno, et al.
Published: (2023)
Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials
by: Furuta, Hiroki, et al.
Published: (2024)
by: Furuta, Hiroki, et al.
Published: (2024)
Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking
by: Gu, Zihan, et al.
Published: (2025)
by: Gu, Zihan, et al.
Published: (2025)
The Complexity Dynamics of Grokking
by: DeMoss, Branton, et al.
Published: (2024)
by: DeMoss, Branton, et al.
Published: (2024)
Measuring Sharpness in Grokking
by: Miller, Jack, et al.
Published: (2024)
by: Miller, Jack, et al.
Published: (2024)
Bridging Lottery Ticket and Grokking: Understanding Grokking from Inner Structure of Networks
by: Minegishi, Gouki, et al.
Published: (2023)
by: Minegishi, Gouki, et al.
Published: (2023)
TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics
by: Chen, Tianrong, et al.
Published: (2025)
by: Chen, Tianrong, et al.
Published: (2025)
Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling
by: Zheng, Huangjie, et al.
Published: (2025)
by: Zheng, Huangjie, et al.
Published: (2025)
Grokking in Linear Models for Logistic Regression
by: Das, Nataraj, et al.
Published: (2026)
by: Das, Nataraj, et al.
Published: (2026)
OmniGuide: Universal Guidance Fields for Enhancing Generalist Robot Policies
by: Song, Yunzhou, et al.
Published: (2026)
by: Song, Yunzhou, et al.
Published: (2026)
Grokfast: Accelerated Grokking by Amplifying Slow Gradients
by: Lee, Jaerin, et al.
Published: (2024)
by: Lee, Jaerin, et al.
Published: (2024)
Topological Signatures of Grokking
by: Tang, Yifan, et al.
Published: (2026)
by: Tang, Yifan, et al.
Published: (2026)
Provable Benefits of Sinusoidal Activation for Modular Addition
by: Huang, Tianlong, et al.
Published: (2025)
by: Huang, Tianlong, et al.
Published: (2025)
Random Gradient Masking as a Defensive Measure to Deep Leakage in Federated Learning
by: Kim, Joon, et al.
Published: (2024)
by: Kim, Joon, et al.
Published: (2024)
Fitting trees to $\ell_1$-hyperbolic distances
by: Yim, Joon-Hyeok, et al.
Published: (2024)
by: Yim, Joon-Hyeok, et al.
Published: (2024)
A Systematic Empirical Study of Grokking: Depth, Architecture, Activation, and Regularization
by: Manir, Shalima Binta, et al.
Published: (2026)
by: Manir, Shalima Binta, et al.
Published: (2026)
The Sparse Tsetlin Machine: Sparse Representation with Active Literals
by: Østby, Sebastian, et al.
Published: (2024)
by: Østby, Sebastian, et al.
Published: (2024)
ILDR: Geometric Early Detection of Grokking
by: Golwala, Shreel
Published: (2026)
by: Golwala, Shreel
Published: (2026)
Exploring Grokking: Experimental and Mechanistic Investigations
by: Qiye, Hu, et al.
Published: (2024)
by: Qiye, Hu, et al.
Published: (2024)
Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity
by: Miller, Jack, et al.
Published: (2023)
by: Miller, Jack, et al.
Published: (2023)
WAY: Estimation of Vessel Destination in Worldwide AIS Trajectory
by: Kim, Jin Sob, et al.
Published: (2025)
by: Kim, Jin Sob, et al.
Published: (2025)
KLASS: KL-Guided Fast Inference in Masked Diffusion Models
by: Kim, Seo Hyun, et al.
Published: (2025)
by: Kim, Seo Hyun, et al.
Published: (2025)
Learning Large-Scale Modular Addition with an Auxiliary Modulus
by: Kikuchi, Hanato, et al.
Published: (2026)
by: Kikuchi, Hanato, et al.
Published: (2026)
Clustering and Alignment: Understanding the Training Dynamics in Modular Addition
by: Musat, Tiberiu
Published: (2024)
by: Musat, Tiberiu
Published: (2024)
Towards Privacy-Preserving Relational Data Synthesis via Probabilistic Relational Models
by: Luttermann, Malte, et al.
Published: (2024)
by: Luttermann, Malte, et al.
Published: (2024)
Decoding Fatigue Levels of Pilots Using EEG Signals with Hybrid Deep Neural Networks
by: Lee, Dae-Hyeok, et al.
Published: (2024)
by: Lee, Dae-Hyeok, et al.
Published: (2024)
Distributional Spectral Diagnostics for Localizing Grokking Transitions
by: Wang, Ziyue, et al.
Published: (2026)
by: Wang, Ziyue, et al.
Published: (2026)
GrokAlign: Geometric Characterisation and Acceleration of Grokking
by: Walker, Thomas, et al.
Published: (2025)
by: Walker, Thomas, et al.
Published: (2025)
Model Capacity Determines Grokking through Competing Memorisation and Generalisation Speeds
by: Song, Yiding, et al.
Published: (2026)
by: Song, Yiding, et al.
Published: (2026)
Training Unbiased Diffusion Models From Biased Dataset
by: Kim, Yeongmin, et al.
Published: (2024)
by: Kim, Yeongmin, et al.
Published: (2024)
Green Tsetlin Redefining Efficiency in Tsetlin Machine Frameworks
by: Glimsdal, Sondre, et al.
Published: (2024)
by: Glimsdal, Sondre, et al.
Published: (2024)
Similar Items
-
On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking
by: He, Jianliang, et al.
Published: (2026) -
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition
by: Mohamadi, Mohamad Amin, et al.
Published: (2024) -
Grokking Modular Polynomials
by: Doshi, Darshil, et al.
Published: (2024) -
Survival of the Fittest Representation: A Case Study with Modular Addition
by: Ding, Xiaoman Delores, et al.
Published: (2024) -
Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation
by: Park, Yeachan, et al.
Published: (2024)