:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sakamoto, Keitaro, Sato, Issei
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2509.20829
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

End-to-End Training Induces Information Bottleneck through Layer-Role Differentiation: A Comparative Analysis with Layer-wise Training
by: Sakamoto, Keitaro, et al.
Published: (2024)

Benign Overfitting in Token Selection of Attention Mechanism
by: Sakamoto, Keitaro, et al.
Published: (2024)

Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment
by: Hasegawa, Naoya, et al.
Published: (2024)

Can Test-time Computation Mitigate Reproduction Bias in Neural Symbolic Regression?
by: Sato, Shun, et al.
Published: (2025)

Explaining Grokking in Transformers through the Lens of Inductive Bias
by: Singh, Jaisidh, et al.
Published: (2026)

Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking
by: Han, Ting, et al.
Published: (2025)

Understanding Generalization in Physics Informed Models through Affine Variety Dimensions
by: Koshizuka, Takeshi, et al.
Published: (2025)

Grokking Explained: A Statistical Phenomenon
by: Carvalho, Breno W., et al.
Published: (2025)

Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
by: Nam, Yoonsoo, et al.
Published: (2025)

Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation
by: Tomihari, Akiyoshi, et al.
Published: (2026)

Exploring Weight Balancing on Long-Tailed Recognition Problem
by: Hasegawa, Naoya, et al.
Published: (2023)

On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
by: Xu, Kevin, et al.
Published: (2024)

Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective
by: Tomihari, Akiyoshi, et al.
Published: (2024)

Top-Down Bayesian Posterior Sampling for Sum-Product Networks
by: Yokoi, Soma, et al.
Published: (2024)

Late-Stage Generalization Collapse in Grokking: Detecting anti-grokking with Weightwatcher
by: Prakash, Hari K, et al.
Published: (2026)

Understanding the Expressivity and Trainability of Fourier Neural Operator: A Mean-Field Perspective
by: Koshizuka, Takeshi, et al.
Published: (2023)

To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers
by: Xu, Kevin, et al.
Published: (2025)

Max-pooling Network Revisited: Analyzing the Role of Semantic Probability in Multiple Instance Learning for Hallucination Detection
by: Fujikawa, Shota, et al.
Published: (2026)

Rethinking Associative Memory Mechanism in Induction Head
by: Wang, Shuo, et al.
Published: (2024)

On the Optimal Memorization Capacity of Transformers
by: Kajitsuka, Tokio, et al.
Published: (2024)

Fix Initial Codes and Iteratively Refine Textual Directions Toward Safe Multi-Turn Code Correction
by: Tanaka, Yuto, et al.
Published: (2026)

Grokking and Generalization Collapse: Insights from \texttt{HTSR} theory
by: Prakash, Hari K., et al.
Published: (2025)

A Formal Comparison Between Chain of Thought and Latent Thought
by: Xu, Kevin, et al.
Published: (2025)

Understanding Transformer Optimization via Gradient Heterogeneity
by: Tomihari, Akiyoshi, et al.
Published: (2025)

Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
by: Kajitsuka, Tokio, et al.
Published: (2023)

Locking Pretrained Weights via Deep Low-Rank Residual Distillation
by: Sakamoto, Keitaro, et al.
Published: (2026)

To Grok Grokking: Provable Grokking in Ridge Regression
by: Xu, Mingyue, et al.
Published: (2026)

Mutual Information Collapse Explains Disentanglement Failure in $β$-VAEs
by: Vu, Minh, et al.
Published: (2026)

Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking
by: Tian, Yuandong
Published: (2025)

NeuralGrok: Accelerate Grokking by Neural Gradient Transformation
by: Zhou, Xinyu, et al.
Published: (2025)

Explaining and Preventing Alignment Collapse in Iterative RLHF
by: Gauthier, Etienne, et al.
Published: (2026)

The Complexity Dynamics of Grokking
by: DeMoss, Branton, et al.
Published: (2024)

Measuring Sharpness in Grokking
by: Miller, Jack, et al.
Published: (2024)

Deep Grokking: Would Deep Neural Networks Generalize Better?
by: Fan, Simin, et al.
Published: (2024)

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity
by: Miller, Jack, et al.
Published: (2023)

Bridging Lottery Ticket and Grokking: Understanding Grokking from Inner Structure of Networks
by: Minegishi, Gouki, et al.
Published: (2023)

Directional Neural Collapse Explains Few-Shot Transfer in Self-Supervised Learning
by: Luthra, Achleshwar, et al.
Published: (2026)

Can Kernel Methods Explain How the Data Affects Neural Collapse?
by: Kothapalli, Vignesh, et al.
Published: (2024)

Aligning Multimodal Representations through an Information Bottleneck
by: Almudévar, Antonio, et al.
Published: (2025)

Model Capacity Determines Grokking through Competing Memorisation and Generalisation Speeds
by: Song, Yiding, et al.
Published: (2026)