Guardado en:
| Autores principales: | Tan, Zhiquan, Huang, Weiran |
|---|---|
| Formato: | Preprint |
| Publicado: |
2023
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2311.06597 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
The Information of Large Language Model Geometry
por: Tan, Zhiquan, et al.
Publicado: (2024)
por: Tan, Zhiquan, et al.
Publicado: (2024)
ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy
por: Li, Hong, et al.
Publicado: (2024)
por: Li, Hong, et al.
Publicado: (2024)
Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models
por: Wei, Lai, et al.
Publicado: (2024)
por: Wei, Lai, et al.
Publicado: (2024)
Matrix Information Theory for Self-Supervised Learning
por: Zhang, Yifan, et al.
Publicado: (2023)
por: Zhang, Yifan, et al.
Publicado: (2023)
Provable Contrastive Continual Learning
por: Wen, Yichen, et al.
Publicado: (2024)
por: Wen, Yichen, et al.
Publicado: (2024)
Topological Signatures of Grokking
por: Tang, Yifan, et al.
Publicado: (2026)
por: Tang, Yifan, et al.
Publicado: (2026)
Information-Theoretic Perspectives on Optimizers
por: Tan, Zhiquan, et al.
Publicado: (2025)
por: Tan, Zhiquan, et al.
Publicado: (2025)
Grokking Explained: A Statistical Phenomenon
por: Carvalho, Breno W., et al.
Publicado: (2025)
por: Carvalho, Breno W., et al.
Publicado: (2025)
Grokking in Linear Models for Logistic Regression
por: Das, Nataraj, et al.
Publicado: (2026)
por: Das, Nataraj, et al.
Publicado: (2026)
Controlling Grokking with Nonlinearity and Data Symmetry
por: Salah, Ahmed, et al.
Publicado: (2024)
por: Salah, Ahmed, et al.
Publicado: (2024)
Grokfast: Accelerated Grokking by Amplifying Slow Gradients
por: Lee, Jaerin, et al.
Publicado: (2024)
por: Lee, Jaerin, et al.
Publicado: (2024)
Progress Measures for Grokking on Real-world Tasks
por: Golechha, Satvik
Publicado: (2024)
por: Golechha, Satvik
Publicado: (2024)
Grokking Group Multiplication with Cosets
por: Stander, Dashiell, et al.
Publicado: (2023)
por: Stander, Dashiell, et al.
Publicado: (2023)
Grokking Finite-Dimensional Algebra
por: Notsawo, Pascal Jr Tikeng, et al.
Publicado: (2026)
por: Notsawo, Pascal Jr Tikeng, et al.
Publicado: (2026)
Muon Optimizer Accelerates Grokking
por: Tveit, Amund, et al.
Publicado: (2025)
por: Tveit, Amund, et al.
Publicado: (2025)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond
por: Jeffares, Alan, et al.
Publicado: (2024)
por: Jeffares, Alan, et al.
Publicado: (2024)
Tracing the Path to Grokking: Embeddings, Dropout, and Network Activation
por: Salah, Ahmed, et al.
Publicado: (2025)
por: Salah, Ahmed, et al.
Publicado: (2025)
Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking
por: Xu, Yongzhong
Publicado: (2026)
por: Xu, Yongzhong
Publicado: (2026)
The Geometry of Grokking: Norm Minimization on the Zero-Loss Manifold
por: Musat, Tiberiu
Publicado: (2025)
por: Musat, Tiberiu
Publicado: (2025)
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation
por: Zhou, Xinyu, et al.
Publicado: (2025)
por: Zhou, Xinyu, et al.
Publicado: (2025)
Early-Warning Signals of Grokking via Loss-Landscape Geometry
por: Xu, Yongzhong
Publicado: (2026)
por: Xu, Yongzhong
Publicado: (2026)
Grokking and Generalization Collapse: Insights from \texttt{HTSR} theory
por: Prakash, Hari K., et al.
Publicado: (2025)
por: Prakash, Hari K., et al.
Publicado: (2025)
Grokking as Structural Inference: Transformers Need Bayesian Lottery Tickets
por: Hidajat, Kai, et al.
Publicado: (2026)
por: Hidajat, Kai, et al.
Publicado: (2026)
Two Speeds of Learning: A Representation-Readout Decomposition of Grokking and Double Descent
por: Chou, Chi-Ning, et al.
Publicado: (2026)
por: Chou, Chi-Ning, et al.
Publicado: (2026)
Grokking as a Falsifiable Finite-Size Transition
por: Bi, Yuda, et al.
Publicado: (2026)
por: Bi, Yuda, et al.
Publicado: (2026)
Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation
por: Park, Yeachan, et al.
Publicado: (2024)
por: Park, Yeachan, et al.
Publicado: (2024)
Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking
por: Tian, Yuandong
Publicado: (2025)
por: Tian, Yuandong
Publicado: (2025)
Grokking at the Edge of Numerical Stability
por: Prieto, Lucas, et al.
Publicado: (2025)
por: Prieto, Lucas, et al.
Publicado: (2025)
A Semi-supervised Generative Model for Incomplete Multi-view Data Integration with Missing Labels
por: Shen, Yiyang, et al.
Publicado: (2025)
por: Shen, Yiyang, et al.
Publicado: (2025)
The Norm-Separation Delay Law of Grokking: A First-Principles Theory of Delayed Generalization
por: Khanh, Truong Xuan, et al.
Publicado: (2026)
por: Khanh, Truong Xuan, et al.
Publicado: (2026)
Lean Finder: Semantic Search for Mathlib That Understands User Intents
por: Lu, Jialin, et al.
Publicado: (2025)
por: Lu, Jialin, et al.
Publicado: (2025)
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
por: Lyu, Kaifeng, et al.
Publicado: (2023)
por: Lyu, Kaifeng, et al.
Publicado: (2023)
Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials
por: Furuta, Hiroki, et al.
Publicado: (2024)
por: Furuta, Hiroki, et al.
Publicado: (2024)
Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking
por: Xu, Yongzhong
Publicado: (2026)
por: Xu, Yongzhong
Publicado: (2026)
Grokking Beyond the Euclidean Norm of Model Parameters
por: Notsawo, Pascal Jr Tikeng, et al.
Publicado: (2025)
por: Notsawo, Pascal Jr Tikeng, et al.
Publicado: (2025)
The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure
por: Xu, Yongzhong
Publicado: (2026)
por: Xu, Yongzhong
Publicado: (2026)
Grokking as a Variance-Limited Phase Transition: Spectral Gating and the Epsilon-Stability Threshold
por: Acharya, Pratyush, et al.
Publicado: (2026)
por: Acharya, Pratyush, et al.
Publicado: (2026)
Provable Training Data Identification for Large Language Models
por: Liu, Zhenlong, et al.
Publicado: (2025)
por: Liu, Zhenlong, et al.
Publicado: (2025)
A Statistical Theory of Regularization-Based Continual Learning
por: Zhao, Xuyang, et al.
Publicado: (2024)
por: Zhao, Xuyang, et al.
Publicado: (2024)
First-Passage Prediction of Grokking Delay: ACalibrated Law under AdamW with Causal Validation
por: Khanh, Truong Xuan, et al.
Publicado: (2026)
por: Khanh, Truong Xuan, et al.
Publicado: (2026)
Ejemplares similares
-
The Information of Large Language Model Geometry
por: Tan, Zhiquan, et al.
Publicado: (2024) -
ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy
por: Li, Hong, et al.
Publicado: (2024) -
Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models
por: Wei, Lai, et al.
Publicado: (2024) -
Matrix Information Theory for Self-Supervised Learning
por: Zhang, Yifan, et al.
Publicado: (2023) -
Provable Contrastive Continual Learning
por: Wen, Yichen, et al.
Publicado: (2024)