Saved in:
| Main Authors: | Gordon, Andrew, Baker, Garrett, Wang, George, Snell, William, van Wingerden, Stan, Murfet, Daniel |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.12703 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Embryology of a Language Model
by: Wang, George, et al.
Published: (2025)
by: Wang, George, et al.
Published: (2025)
Structural Inference: Interpreting Small Language Models with Susceptibilities
by: Baker, Garrett, et al.
Published: (2025)
by: Baker, Garrett, et al.
Published: (2025)
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
by: Wang, George, et al.
Published: (2024)
by: Wang, George, et al.
Published: (2024)
Compressibility Measures Complexity: Minimum Description Length Meets Singular Learning Theory
by: Urdshals, Einar, et al.
Published: (2025)
by: Urdshals, Einar, et al.
Published: (2025)
Patterning: The Dual of Interpretability
by: Wang, George, et al.
Published: (2026)
by: Wang, George, et al.
Published: (2026)
Susceptibilities and Patterning: A Primer on Linear Response in Bayesian Learning
by: Elliott, Chris, et al.
Published: (2026)
by: Elliott, Chris, et al.
Published: (2026)
Interpreting Reinforcement Learning Agents with Susceptibilities
by: Elliott, Chris, et al.
Published: (2026)
by: Elliott, Chris, et al.
Published: (2026)
Modes of Sequence Models and Learning Coefficients
by: Chen, Zhongtian, et al.
Published: (2025)
by: Chen, Zhongtian, et al.
Published: (2025)
Linear Response Estimators for Singular Statistical Models
by: Elliott, Chris, et al.
Published: (2026)
by: Elliott, Chris, et al.
Published: (2026)
Programs as Singularities
by: Murfet, Daniel, et al.
Published: (2025)
by: Murfet, Daniel, et al.
Published: (2025)
In-Context Clustering with Large Language Models
by: Wang, Ying, et al.
Published: (2025)
by: Wang, Ying, et al.
Published: (2025)
The Local Learning Coefficient: A Singularity-Aware Complexity Measure
by: Lau, Edmund, et al.
Published: (2023)
by: Lau, Edmund, et al.
Published: (2023)
Dynamics of Transient Structure in In-Context Linear Regression Transformers
by: Carroll, Liam, et al.
Published: (2025)
by: Carroll, Liam, et al.
Published: (2025)
Loss Landscape Degeneracy and Stagewise Development in Transformers
by: Hoogland, Jesse, et al.
Published: (2024)
by: Hoogland, Jesse, et al.
Published: (2024)
Stagewise Reinforcement Learning and the Geometry of the Regret Landscape
by: Elliott, Chris, et al.
Published: (2026)
by: Elliott, Chris, et al.
Published: (2026)
You Are What You Eat -- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation
by: Lehalleur, Simon Pepin, et al.
Published: (2025)
by: Lehalleur, Simon Pepin, et al.
Published: (2025)
Meta-Learning at Scale for Large Language Models via Low-Rank Amortized Bayesian Meta-Learning
by: Zhang, Liyi, et al.
Published: (2025)
by: Zhang, Liyi, et al.
Published: (2025)
Large Language Models Are Zero-Shot Time Series Forecasters
by: Gruver, Nate, et al.
Published: (2023)
by: Gruver, Nate, et al.
Published: (2023)
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
by: Snell, Charlie, et al.
Published: (2024)
by: Snell, Charlie, et al.
Published: (2024)
Conformal Prediction as Bayesian Quadrature
by: Snell, Jake C., et al.
Published: (2025)
by: Snell, Jake C., et al.
Published: (2025)
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM
by: Fan, Zehao, et al.
Published: (2025)
by: Fan, Zehao, et al.
Published: (2025)
ADNAC: Audio Denoiser using Neural Audio Codec
by: Jimon, Daniel, et al.
Published: (2025)
by: Jimon, Daniel, et al.
Published: (2025)
SimVPv2: Towards Simple yet Powerful Spatiotemporal Predictive Learning
by: Tan, Cheng, et al.
Published: (2022)
by: Tan, Cheng, et al.
Published: (2022)
Deep Learning is Not So Mysterious or Different
by: Wilson, Andrew Gordon
Published: (2025)
by: Wilson, Andrew Gordon
Published: (2025)
Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay
by: Marek, Martin, et al.
Published: (2026)
by: Marek, Martin, et al.
Published: (2026)
Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models
by: Zollo, Thomas P., et al.
Published: (2023)
by: Zollo, Thomas P., et al.
Published: (2023)
On the Reproducibility of "FairCLIP: Harnessing Fairness in Vision-Language Learning''
by: Bakker, Hua Chang, et al.
Published: (2025)
by: Bakker, Hua Chang, et al.
Published: (2025)
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
by: Lotfi, Sanae, et al.
Published: (2024)
by: Lotfi, Sanae, et al.
Published: (2024)
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
by: Marek, Martin, et al.
Published: (2025)
by: Marek, Martin, et al.
Published: (2025)
Non-Vacuous Generalization Bounds for Large Language Models
by: Lotfi, Sanae, et al.
Published: (2023)
by: Lotfi, Sanae, et al.
Published: (2023)
Predicting Emergent Capabilities by Finetuning
by: Snell, Charlie, et al.
Published: (2024)
by: Snell, Charlie, et al.
Published: (2024)
Beyond the Academic Monoculture: A Unified Framework and Industrial Perspective for Attributed Graph Clustering
by: Liu, Yunhui, et al.
Published: (2026)
by: Liu, Yunhui, et al.
Published: (2026)
Machine Learning for Raman Spectroscopy-based Cyber-Marine Fish Biochemical Composition Analysis
by: Zhou, Yun, et al.
Published: (2024)
by: Zhou, Yun, et al.
Published: (2024)
Q-Learning with Clustered-SMART (cSMART) Data: Examining Moderators in the Construction of Clustered Adaptive Interventions
by: Song, Yao, et al.
Published: (2025)
by: Song, Yao, et al.
Published: (2025)
A Nonparametric Discrete Hawkes Model with a Collapsed Gaussian-Process Prior
by: Brisley, Trinnhallen, et al.
Published: (2025)
by: Brisley, Trinnhallen, et al.
Published: (2025)
Dynamics-inspired Structure Hallucination for Protein-protein Interaction Modeling
by: Wu, Fang, et al.
Published: (2026)
by: Wu, Fang, et al.
Published: (2026)
Mechanistic Exploration of Backdoored Large Language Model Attention Patterns
by: Baker, Mohammed Abu, et al.
Published: (2025)
by: Baker, Mohammed Abu, et al.
Published: (2025)
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory
by: Liu, Zicheng, et al.
Published: (2024)
by: Liu, Zicheng, et al.
Published: (2024)
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
by: Gruver, Nate, et al.
Published: (2024)
by: Gruver, Nate, et al.
Published: (2024)
Coding historical causes of death data with Large Language Models
by: Pedersen, Bjørn, et al.
Published: (2024)
by: Pedersen, Bjørn, et al.
Published: (2024)
Similar Items
-
Embryology of a Language Model
by: Wang, George, et al.
Published: (2025) -
Structural Inference: Interpreting Small Language Models with Susceptibilities
by: Baker, Garrett, et al.
Published: (2025) -
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
by: Wang, George, et al.
Published: (2024) -
Compressibility Measures Complexity: Minimum Description Length Meets Singular Learning Theory
by: Urdshals, Einar, et al.
Published: (2025) -
Patterning: The Dual of Interpretability
by: Wang, George, et al.
Published: (2026)