:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gordon, Andrew, Baker, Garrett, Wang, George, Snell, William, van Wingerden, Stan, Murfet, Daniel
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2601.12703
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Embryology of a Language Model
by: Wang, George, et al.
Published: (2025)

Structural Inference: Interpreting Small Language Models with Susceptibilities
by: Baker, Garrett, et al.
Published: (2025)

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
by: Wang, George, et al.
Published: (2024)

Compressibility Measures Complexity: Minimum Description Length Meets Singular Learning Theory
by: Urdshals, Einar, et al.
Published: (2025)

Patterning: The Dual of Interpretability
by: Wang, George, et al.
Published: (2026)

Susceptibilities and Patterning: A Primer on Linear Response in Bayesian Learning
by: Elliott, Chris, et al.
Published: (2026)

Interpreting Reinforcement Learning Agents with Susceptibilities
by: Elliott, Chris, et al.
Published: (2026)

Modes of Sequence Models and Learning Coefficients
by: Chen, Zhongtian, et al.
Published: (2025)

Linear Response Estimators for Singular Statistical Models
by: Elliott, Chris, et al.
Published: (2026)

Programs as Singularities
by: Murfet, Daniel, et al.
Published: (2025)

In-Context Clustering with Large Language Models
by: Wang, Ying, et al.
Published: (2025)

The Local Learning Coefficient: A Singularity-Aware Complexity Measure
by: Lau, Edmund, et al.
Published: (2023)

Dynamics of Transient Structure in In-Context Linear Regression Transformers
by: Carroll, Liam, et al.
Published: (2025)

Loss Landscape Degeneracy and Stagewise Development in Transformers
by: Hoogland, Jesse, et al.
Published: (2024)

Stagewise Reinforcement Learning and the Geometry of the Regret Landscape
by: Elliott, Chris, et al.
Published: (2026)

You Are What You Eat -- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation
by: Lehalleur, Simon Pepin, et al.
Published: (2025)

Meta-Learning at Scale for Large Language Models via Low-Rank Amortized Bayesian Meta-Learning
by: Zhang, Liyi, et al.
Published: (2025)

Large Language Models Are Zero-Shot Time Series Forecasters
by: Gruver, Nate, et al.
Published: (2023)

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
by: Snell, Charlie, et al.
Published: (2024)

Conformal Prediction as Bayesian Quadrature
by: Snell, Jake C., et al.
Published: (2025)

Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM
by: Fan, Zehao, et al.
Published: (2025)

ADNAC: Audio Denoiser using Neural Audio Codec
by: Jimon, Daniel, et al.
Published: (2025)

SimVPv2: Towards Simple yet Powerful Spatiotemporal Predictive Learning
by: Tan, Cheng, et al.
Published: (2022)

Deep Learning is Not So Mysterious or Different
by: Wilson, Andrew Gordon
Published: (2025)

Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay
by: Marek, Martin, et al.
Published: (2026)

Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models
by: Zollo, Thomas P., et al.
Published: (2023)

On the Reproducibility of "FairCLIP: Harnessing Fairness in Vision-Language Learning''
by: Bakker, Hua Chang, et al.
Published: (2025)

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
by: Lotfi, Sanae, et al.
Published: (2024)

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
by: Marek, Martin, et al.
Published: (2025)

Non-Vacuous Generalization Bounds for Large Language Models
by: Lotfi, Sanae, et al.
Published: (2023)

Predicting Emergent Capabilities by Finetuning
by: Snell, Charlie, et al.
Published: (2024)

Beyond the Academic Monoculture: A Unified Framework and Industrial Perspective for Attributed Graph Clustering
by: Liu, Yunhui, et al.
Published: (2026)

Machine Learning for Raman Spectroscopy-based Cyber-Marine Fish Biochemical Composition Analysis
by: Zhou, Yun, et al.
Published: (2024)

Q-Learning with Clustered-SMART (cSMART) Data: Examining Moderators in the Construction of Clustered Adaptive Interventions
by: Song, Yao, et al.
Published: (2025)

A Nonparametric Discrete Hawkes Model with a Collapsed Gaussian-Process Prior
by: Brisley, Trinnhallen, et al.
Published: (2025)

Dynamics-inspired Structure Hallucination for Protein-protein Interaction Modeling
by: Wu, Fang, et al.
Published: (2026)

Mechanistic Exploration of Backdoored Large Language Model Attention Patterns
by: Baker, Mohammed Abu, et al.
Published: (2025)

LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory
by: Liu, Zicheng, et al.
Published: (2024)

Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
by: Gruver, Nate, et al.
Published: (2024)

Coding historical causes of death data with Large Language Models
by: Pedersen, Bjørn, et al.
Published: (2024)