:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Tan, Zhiquan, Huang, Weiran
Formato:	Preprint
Publicado:	2023
Materias:	Machine Learning Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2311.06597
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

The Information of Large Language Model Geometry
por: Tan, Zhiquan, et al.
Publicado: (2024)

ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy
por: Li, Hong, et al.
Publicado: (2024)

Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models
por: Wei, Lai, et al.
Publicado: (2024)

Matrix Information Theory for Self-Supervised Learning
por: Zhang, Yifan, et al.
Publicado: (2023)

Provable Contrastive Continual Learning
por: Wen, Yichen, et al.
Publicado: (2024)

Topological Signatures of Grokking
por: Tang, Yifan, et al.
Publicado: (2026)

Information-Theoretic Perspectives on Optimizers
por: Tan, Zhiquan, et al.
Publicado: (2025)

Grokking Explained: A Statistical Phenomenon
por: Carvalho, Breno W., et al.
Publicado: (2025)

Grokking in Linear Models for Logistic Regression
por: Das, Nataraj, et al.
Publicado: (2026)

Controlling Grokking with Nonlinearity and Data Symmetry
por: Salah, Ahmed, et al.
Publicado: (2024)

Grokfast: Accelerated Grokking by Amplifying Slow Gradients
por: Lee, Jaerin, et al.
Publicado: (2024)

Progress Measures for Grokking on Real-world Tasks
por: Golechha, Satvik
Publicado: (2024)

Grokking Group Multiplication with Cosets
por: Stander, Dashiell, et al.
Publicado: (2023)

Grokking Finite-Dimensional Algebra
por: Notsawo, Pascal Jr Tikeng, et al.
Publicado: (2026)

Muon Optimizer Accelerates Grokking
por: Tveit, Amund, et al.
Publicado: (2025)

Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond
por: Jeffares, Alan, et al.
Publicado: (2024)

Tracing the Path to Grokking: Embeddings, Dropout, and Network Activation
por: Salah, Ahmed, et al.
Publicado: (2025)

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking
por: Xu, Yongzhong
Publicado: (2026)

The Geometry of Grokking: Norm Minimization on the Zero-Loss Manifold
por: Musat, Tiberiu
Publicado: (2025)

NeuralGrok: Accelerate Grokking by Neural Gradient Transformation
por: Zhou, Xinyu, et al.
Publicado: (2025)

Early-Warning Signals of Grokking via Loss-Landscape Geometry
por: Xu, Yongzhong
Publicado: (2026)

Grokking and Generalization Collapse: Insights from \texttt{HTSR} theory
por: Prakash, Hari K., et al.
Publicado: (2025)

Grokking as Structural Inference: Transformers Need Bayesian Lottery Tickets
por: Hidajat, Kai, et al.
Publicado: (2026)

Two Speeds of Learning: A Representation-Readout Decomposition of Grokking and Double Descent
por: Chou, Chi-Ning, et al.
Publicado: (2026)

Grokking as a Falsifiable Finite-Size Transition
por: Bi, Yuda, et al.
Publicado: (2026)

Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation
por: Park, Yeachan, et al.
Publicado: (2024)

Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking
por: Tian, Yuandong
Publicado: (2025)

Grokking at the Edge of Numerical Stability
por: Prieto, Lucas, et al.
Publicado: (2025)

A Semi-supervised Generative Model for Incomplete Multi-view Data Integration with Missing Labels
por: Shen, Yiyang, et al.
Publicado: (2025)

The Norm-Separation Delay Law of Grokking: A First-Principles Theory of Delayed Generalization
por: Khanh, Truong Xuan, et al.
Publicado: (2026)

Lean Finder: Semantic Search for Mathlib That Understands User Intents
por: Lu, Jialin, et al.
Publicado: (2025)

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
por: Lyu, Kaifeng, et al.
Publicado: (2023)

Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials
por: Furuta, Hiroki, et al.
Publicado: (2024)

Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking
por: Xu, Yongzhong
Publicado: (2026)

Grokking Beyond the Euclidean Norm of Model Parameters
por: Notsawo, Pascal Jr Tikeng, et al.
Publicado: (2025)

The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure
por: Xu, Yongzhong
Publicado: (2026)

Grokking as a Variance-Limited Phase Transition: Spectral Gating and the Epsilon-Stability Threshold
por: Acharya, Pratyush, et al.
Publicado: (2026)

Provable Training Data Identification for Large Language Models
por: Liu, Zhenlong, et al.
Publicado: (2025)

A Statistical Theory of Regularization-Based Continual Learning
por: Zhao, Xuyang, et al.
Publicado: (2024)

First-Passage Prediction of Grokking Delay: ACalibrated Law under AdamW with Causal Validation
por: Khanh, Truong Xuan, et al.
Publicado: (2026)