:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, George, Baker, Garrett, Gordon, Andrew, Murfet, Daniel
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2508.00331
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards Spectroscopy: Susceptibility Clusters in Language Models
by: Gordon, Andrew, et al.
Published: (2026)

Structural Inference: Interpreting Small Language Models with Susceptibilities
by: Baker, Garrett, et al.
Published: (2025)

Patterning: The Dual of Interpretability
by: Wang, George, et al.
Published: (2026)

Modes of Sequence Models and Learning Coefficients
by: Chen, Zhongtian, et al.
Published: (2025)

Linear Response Estimators for Singular Statistical Models
by: Elliott, Chris, et al.
Published: (2026)

Programs as Singularities
by: Murfet, Daniel, et al.
Published: (2025)

Susceptibilities and Patterning: A Primer on Linear Response in Bayesian Learning
by: Elliott, Chris, et al.
Published: (2026)

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
by: Wang, George, et al.
Published: (2024)

Interpreting Reinforcement Learning Agents with Susceptibilities
by: Elliott, Chris, et al.
Published: (2026)

The Local Learning Coefficient: A Singularity-Aware Complexity Measure
by: Lau, Edmund, et al.
Published: (2023)

Dynamics of Transient Structure in In-Context Linear Regression Transformers
by: Carroll, Liam, et al.
Published: (2025)

Loss Landscape Degeneracy and Stagewise Development in Transformers
by: Hoogland, Jesse, et al.
Published: (2024)

In-Context Clustering with Large Language Models
by: Wang, Ying, et al.
Published: (2025)

Compressibility Measures Complexity: Minimum Description Length Meets Singular Learning Theory
by: Urdshals, Einar, et al.
Published: (2025)

Stagewise Reinforcement Learning and the Geometry of the Regret Landscape
by: Elliott, Chris, et al.
Published: (2026)

You Are What You Eat -- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation
by: Lehalleur, Simon Pepin, et al.
Published: (2025)

Large Language Models Are Zero-Shot Time Series Forecasters
by: Gruver, Nate, et al.
Published: (2023)

Deep Learning is Not So Mysterious or Different
by: Wilson, Andrew Gordon
Published: (2025)

Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay
by: Marek, Martin, et al.
Published: (2026)

A Nonparametric Discrete Hawkes Model with a Collapsed Gaussian-Process Prior
by: Brisley, Trinnhallen, et al.
Published: (2025)

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
by: Lotfi, Sanae, et al.
Published: (2024)

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
by: Marek, Martin, et al.
Published: (2025)

Non-Vacuous Generalization Bounds for Large Language Models
by: Lotfi, Sanae, et al.
Published: (2023)

Mechanistic Exploration of Backdoored Large Language Model Attention Patterns
by: Baker, Mohammed Abu, et al.
Published: (2025)

Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
by: Gruver, Nate, et al.
Published: (2024)

Coding historical causes of death data with Large Language Models
by: Pedersen, Bjørn, et al.
Published: (2024)

SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration
by: Cavanagh, Joseph M., et al.
Published: (2024)

Transferring Knowledge from Large Foundation Models to Small Downstream Models
by: Qiu, Shikai, et al.
Published: (2024)

Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency
by: Amin, Alan Nawzad, et al.
Published: (2024)

Scaling Sign Language Translation
by: Zhang, Biao, et al.
Published: (2024)

SynLlama: Generating Synthesizable Molecules and Their Analogs with Large Language Models
by: Sun, Kunyang, et al.
Published: (2025)

LMD3: Language Model Data Density Dependence
by: Kirchenbauer, John, et al.
Published: (2024)

Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra
by: Amin, Alan N., et al.
Published: (2025)

Controllable Prompt Tuning For Balancing Group Distributional Robustness
by: Phan, Hoang, et al.
Published: (2024)

Modeling Real-Time Interactive Conversations as Timed Diarized Transcripts
by: Tanzer, Garrett, et al.
Published: (2024)

Localizing Paragraph Memorization in Language Models
by: Stoehr, Niklas, et al.
Published: (2024)

IMU-1: Sample-Efficient Pre-training of Small Language Models
by: Grigorev, George
Published: (2026)

Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion
by: Amin, Alan N., et al.
Published: (2025)

A Primal-Dual Algorithm for Hybrid Federated Learning
by: Overman, Tom, et al.
Published: (2022)

Validating Climate Models with Spherical Convolutional Wasserstein Distance
by: Garrett, Robert C., et al.
Published: (2024)