Saved in:
| Main Authors: | Diaz, Fernando, Madaio, Michael |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2307.03201 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
by: Defilippis, Leonardo, et al.
Published: (2025)
by: Defilippis, Leonardo, et al.
Published: (2025)
Enhancing Noise-Robust Losses for Large-Scale Noisy Data Learning
by: Staats, Max, et al.
Published: (2023)
by: Staats, Max, et al.
Published: (2023)
Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos
by: Sarmiento, Lucas Fernandez
Published: (2026)
by: Sarmiento, Lucas Fernandez
Published: (2026)
How Do Transformers "Do" Physics? Investigating the Simple Harmonic Oscillator
by: Kantamneni, Subhash, et al.
Published: (2024)
by: Kantamneni, Subhash, et al.
Published: (2024)
Learning Shrinks the Hard Tail: Training-Dependent Inference Scaling in a Solvable Linear Model
by: Levi, Noam
Published: (2026)
by: Levi, Noam
Published: (2026)
Explaining Neural Scaling Laws
by: Bahri, Yasaman, et al.
Published: (2021)
by: Bahri, Yasaman, et al.
Published: (2021)
Neural Scaling Laws Rooted in the Data Distribution
by: Brill, Ari
Published: (2024)
by: Brill, Ari
Published: (2024)
A Dynamical Model of Neural Scaling Laws
by: Bordelon, Blake, et al.
Published: (2024)
by: Bordelon, Blake, et al.
Published: (2024)
Predictive Coding Graphs are a Superset of Feedforward Neural Networks
by: van Zwol, Björn
Published: (2026)
by: van Zwol, Björn
Published: (2026)
Nature-Inspired Local Propagation
by: Betti, Alessandro, et al.
Published: (2024)
by: Betti, Alessandro, et al.
Published: (2024)
Approximation Theory for Neural Networks: Old and New
by: Mukherjee, Soumendu Sundar, et al.
Published: (2026)
by: Mukherjee, Soumendu Sundar, et al.
Published: (2026)
Preisach Attention: A Hysteretic Model of Sequential Memory
by: Frydrych, Piotr
Published: (2026)
by: Frydrych, Piotr
Published: (2026)
Predictive Coding Networks and Inference Learning: Tutorial and Survey
by: van Zwol, Björn, et al.
Published: (2024)
by: van Zwol, Björn, et al.
Published: (2024)
How Feature Learning Can Improve Neural Scaling Laws
by: Bordelon, Blake, et al.
Published: (2024)
by: Bordelon, Blake, et al.
Published: (2024)
Asymmetric Scaling Laws from Sparse Features
by: Sous, John, et al.
Published: (2026)
by: Sous, John, et al.
Published: (2026)
Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time
by: Bordelon, Blake, et al.
Published: (2025)
by: Bordelon, Blake, et al.
Published: (2025)
Intuition emerges in Maximum Caliber models at criticality
by: Arola-Fernández, Lluís
Published: (2025)
by: Arola-Fernández, Lluís
Published: (2025)
Emergent Slow Thinking in LLMs as Inverse Tree Freezing
by: Hu, Sihan, et al.
Published: (2025)
by: Hu, Sihan, et al.
Published: (2025)
Towards Worst-Case Guarantees with Scale-Aware Interpretability
by: Greenspan, Lauren, et al.
Published: (2026)
by: Greenspan, Lauren, et al.
Published: (2026)
When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs
by: Tanaka, Hidenori
Published: (2026)
by: Tanaka, Hidenori
Published: (2026)
No Free Lunch From Random Feature Ensembles: Scaling Laws and Near-Optimality Conditions
by: Ruben, Benjamin S., et al.
Published: (2024)
by: Ruben, Benjamin S., et al.
Published: (2024)
Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model
by: Bordelon, Blake, et al.
Published: (2026)
by: Bordelon, Blake, et al.
Published: (2026)
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures
by: Cagnetta, Francesco, et al.
Published: (2025)
by: Cagnetta, Francesco, et al.
Published: (2025)
A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs
by: Kalra, Dayal Singh, et al.
Published: (2026)
by: Kalra, Dayal Singh, et al.
Published: (2026)
Discover physical concepts and equations with machine learning
by: Li, Bao-Bing, et al.
Published: (2024)
by: Li, Bao-Bing, et al.
Published: (2024)
Connecting NTK and NNGP: A Unified Theoretical Framework for Wide Neural Network Learning Dynamics
by: Avidan, Yehonatan, et al.
Published: (2023)
by: Avidan, Yehonatan, et al.
Published: (2023)
Generalization through variance: how noise shapes inductive biases in diffusion models
by: Vastola, John J.
Published: (2025)
by: Vastola, John J.
Published: (2025)
Identifying internal patterns in (1+1)-dimensional directed percolation using neural networks
by: Parkhomenko, Danil, et al.
Published: (2025)
by: Parkhomenko, Danil, et al.
Published: (2025)
Grokking vs. Learning: Same Features, Different Encodings
by: Manning-Coe, Dmitry, et al.
Published: (2025)
by: Manning-Coe, Dmitry, et al.
Published: (2025)
Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate
by: Kalra, Dayal Singh, et al.
Published: (2026)
by: Kalra, Dayal Singh, et al.
Published: (2026)
A Spin Glass Characterization of Neural Networks
by: Li, Jun
Published: (2025)
by: Li, Jun
Published: (2025)
A method for quantifying the generalization capabilities of generative models for solving Ising models
by: Ma, Qunlong, et al.
Published: (2024)
by: Ma, Qunlong, et al.
Published: (2024)
Applications of Statistical Field Theory in Deep Learning
by: Ringel, Zohar, et al.
Published: (2025)
by: Ringel, Zohar, et al.
Published: (2025)
Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer
by: Lauditi, Clarissa, et al.
Published: (2026)
by: Lauditi, Clarissa, et al.
Published: (2026)
KAN: Kolmogorov-Arnold Networks
by: Liu, Ziming, et al.
Published: (2024)
by: Liu, Ziming, et al.
Published: (2024)
On the origin of neural scaling laws: from random graphs to natural language
by: Barkeshli, Maissam, et al.
Published: (2026)
by: Barkeshli, Maissam, et al.
Published: (2026)
A Geometric Perspective on the Difficulties of Learning GNN-based SAT Solvers
by: Skenderi, Geri
Published: (2025)
by: Skenderi, Geri
Published: (2025)
The Persian Rug: solving toy models of superposition using large-scale symmetries
by: Cowsik, Aditya, et al.
Published: (2024)
by: Cowsik, Aditya, et al.
Published: (2024)
More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD)
by: Meir, Sagi, et al.
Published: (2026)
by: Meir, Sagi, et al.
Published: (2026)
Representation Learning on a Random Lattice
by: Brill, Aryeh
Published: (2025)
by: Brill, Aryeh
Published: (2025)
Similar Items
-
Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
by: Defilippis, Leonardo, et al.
Published: (2025) -
Enhancing Noise-Robust Losses for Large-Scale Noisy Data Learning
by: Staats, Max, et al.
Published: (2023) -
Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos
by: Sarmiento, Lucas Fernandez
Published: (2026) -
How Do Transformers "Do" Physics? Investigating the Simple Harmonic Oscillator
by: Kantamneni, Subhash, et al.
Published: (2024) -
Learning Shrinks the Hard Tail: Training-Dependent Inference Scaling in a Solvable Linear Model
by: Levi, Noam
Published: (2026)