Saved in:
| Main Authors: | Wang, George, Baker, Garrett, Gordon, Andrew, Murfet, Daniel |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.00331 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Spectroscopy: Susceptibility Clusters in Language Models
by: Gordon, Andrew, et al.
Published: (2026)
by: Gordon, Andrew, et al.
Published: (2026)
Structural Inference: Interpreting Small Language Models with Susceptibilities
by: Baker, Garrett, et al.
Published: (2025)
by: Baker, Garrett, et al.
Published: (2025)
Patterning: The Dual of Interpretability
by: Wang, George, et al.
Published: (2026)
by: Wang, George, et al.
Published: (2026)
Modes of Sequence Models and Learning Coefficients
by: Chen, Zhongtian, et al.
Published: (2025)
by: Chen, Zhongtian, et al.
Published: (2025)
Linear Response Estimators for Singular Statistical Models
by: Elliott, Chris, et al.
Published: (2026)
by: Elliott, Chris, et al.
Published: (2026)
Programs as Singularities
by: Murfet, Daniel, et al.
Published: (2025)
by: Murfet, Daniel, et al.
Published: (2025)
Susceptibilities and Patterning: A Primer on Linear Response in Bayesian Learning
by: Elliott, Chris, et al.
Published: (2026)
by: Elliott, Chris, et al.
Published: (2026)
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
by: Wang, George, et al.
Published: (2024)
by: Wang, George, et al.
Published: (2024)
Interpreting Reinforcement Learning Agents with Susceptibilities
by: Elliott, Chris, et al.
Published: (2026)
by: Elliott, Chris, et al.
Published: (2026)
The Local Learning Coefficient: A Singularity-Aware Complexity Measure
by: Lau, Edmund, et al.
Published: (2023)
by: Lau, Edmund, et al.
Published: (2023)
Dynamics of Transient Structure in In-Context Linear Regression Transformers
by: Carroll, Liam, et al.
Published: (2025)
by: Carroll, Liam, et al.
Published: (2025)
Loss Landscape Degeneracy and Stagewise Development in Transformers
by: Hoogland, Jesse, et al.
Published: (2024)
by: Hoogland, Jesse, et al.
Published: (2024)
In-Context Clustering with Large Language Models
by: Wang, Ying, et al.
Published: (2025)
by: Wang, Ying, et al.
Published: (2025)
Compressibility Measures Complexity: Minimum Description Length Meets Singular Learning Theory
by: Urdshals, Einar, et al.
Published: (2025)
by: Urdshals, Einar, et al.
Published: (2025)
Stagewise Reinforcement Learning and the Geometry of the Regret Landscape
by: Elliott, Chris, et al.
Published: (2026)
by: Elliott, Chris, et al.
Published: (2026)
You Are What You Eat -- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation
by: Lehalleur, Simon Pepin, et al.
Published: (2025)
by: Lehalleur, Simon Pepin, et al.
Published: (2025)
Large Language Models Are Zero-Shot Time Series Forecasters
by: Gruver, Nate, et al.
Published: (2023)
by: Gruver, Nate, et al.
Published: (2023)
Deep Learning is Not So Mysterious or Different
by: Wilson, Andrew Gordon
Published: (2025)
by: Wilson, Andrew Gordon
Published: (2025)
Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay
by: Marek, Martin, et al.
Published: (2026)
by: Marek, Martin, et al.
Published: (2026)
A Nonparametric Discrete Hawkes Model with a Collapsed Gaussian-Process Prior
by: Brisley, Trinnhallen, et al.
Published: (2025)
by: Brisley, Trinnhallen, et al.
Published: (2025)
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
by: Lotfi, Sanae, et al.
Published: (2024)
by: Lotfi, Sanae, et al.
Published: (2024)
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
by: Marek, Martin, et al.
Published: (2025)
by: Marek, Martin, et al.
Published: (2025)
Non-Vacuous Generalization Bounds for Large Language Models
by: Lotfi, Sanae, et al.
Published: (2023)
by: Lotfi, Sanae, et al.
Published: (2023)
Mechanistic Exploration of Backdoored Large Language Model Attention Patterns
by: Baker, Mohammed Abu, et al.
Published: (2025)
by: Baker, Mohammed Abu, et al.
Published: (2025)
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
by: Gruver, Nate, et al.
Published: (2024)
by: Gruver, Nate, et al.
Published: (2024)
Coding historical causes of death data with Large Language Models
by: Pedersen, Bjørn, et al.
Published: (2024)
by: Pedersen, Bjørn, et al.
Published: (2024)
SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration
by: Cavanagh, Joseph M., et al.
Published: (2024)
by: Cavanagh, Joseph M., et al.
Published: (2024)
Transferring Knowledge from Large Foundation Models to Small Downstream Models
by: Qiu, Shikai, et al.
Published: (2024)
by: Qiu, Shikai, et al.
Published: (2024)
Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency
by: Amin, Alan Nawzad, et al.
Published: (2024)
by: Amin, Alan Nawzad, et al.
Published: (2024)
Scaling Sign Language Translation
by: Zhang, Biao, et al.
Published: (2024)
by: Zhang, Biao, et al.
Published: (2024)
SynLlama: Generating Synthesizable Molecules and Their Analogs with Large Language Models
by: Sun, Kunyang, et al.
Published: (2025)
by: Sun, Kunyang, et al.
Published: (2025)
LMD3: Language Model Data Density Dependence
by: Kirchenbauer, John, et al.
Published: (2024)
by: Kirchenbauer, John, et al.
Published: (2024)
Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra
by: Amin, Alan N., et al.
Published: (2025)
by: Amin, Alan N., et al.
Published: (2025)
Controllable Prompt Tuning For Balancing Group Distributional Robustness
by: Phan, Hoang, et al.
Published: (2024)
by: Phan, Hoang, et al.
Published: (2024)
Modeling Real-Time Interactive Conversations as Timed Diarized Transcripts
by: Tanzer, Garrett, et al.
Published: (2024)
by: Tanzer, Garrett, et al.
Published: (2024)
Localizing Paragraph Memorization in Language Models
by: Stoehr, Niklas, et al.
Published: (2024)
by: Stoehr, Niklas, et al.
Published: (2024)
IMU-1: Sample-Efficient Pre-training of Small Language Models
by: Grigorev, George
Published: (2026)
by: Grigorev, George
Published: (2026)
Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion
by: Amin, Alan N., et al.
Published: (2025)
by: Amin, Alan N., et al.
Published: (2025)
A Primal-Dual Algorithm for Hybrid Federated Learning
by: Overman, Tom, et al.
Published: (2022)
by: Overman, Tom, et al.
Published: (2022)
Validating Climate Models with Spherical Convolutional Wasserstein Distance
by: Garrett, Robert C., et al.
Published: (2024)
by: Garrett, Robert C., et al.
Published: (2024)
Similar Items
-
Towards Spectroscopy: Susceptibility Clusters in Language Models
by: Gordon, Andrew, et al.
Published: (2026) -
Structural Inference: Interpreting Small Language Models with Susceptibilities
by: Baker, Garrett, et al.
Published: (2025) -
Patterning: The Dual of Interpretability
by: Wang, George, et al.
Published: (2026) -
Modes of Sequence Models and Learning Coefficients
by: Chen, Zhongtian, et al.
Published: (2025) -
Linear Response Estimators for Singular Statistical Models
by: Elliott, Chris, et al.
Published: (2026)