Saved in:
| Main Authors: | Wang, George, Murfet, Daniel |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.13548 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Structural Inference: Interpreting Small Language Models with Susceptibilities
by: Baker, Garrett, et al.
Published: (2025)
by: Baker, Garrett, et al.
Published: (2025)
Susceptibilities and Patterning: A Primer on Linear Response in Bayesian Learning
by: Elliott, Chris, et al.
Published: (2026)
by: Elliott, Chris, et al.
Published: (2026)
Interpreting Reinforcement Learning Agents with Susceptibilities
by: Elliott, Chris, et al.
Published: (2026)
by: Elliott, Chris, et al.
Published: (2026)
Embryology of a Language Model
by: Wang, George, et al.
Published: (2025)
by: Wang, George, et al.
Published: (2025)
Modes of Sequence Models and Learning Coefficients
by: Chen, Zhongtian, et al.
Published: (2025)
by: Chen, Zhongtian, et al.
Published: (2025)
Programs as Singularities
by: Murfet, Daniel, et al.
Published: (2025)
by: Murfet, Daniel, et al.
Published: (2025)
Linear Response Estimators for Singular Statistical Models
by: Elliott, Chris, et al.
Published: (2026)
by: Elliott, Chris, et al.
Published: (2026)
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
by: Wang, George, et al.
Published: (2024)
by: Wang, George, et al.
Published: (2024)
Towards Spectroscopy: Susceptibility Clusters in Language Models
by: Gordon, Andrew, et al.
Published: (2026)
by: Gordon, Andrew, et al.
Published: (2026)
The Local Learning Coefficient: A Singularity-Aware Complexity Measure
by: Lau, Edmund, et al.
Published: (2023)
by: Lau, Edmund, et al.
Published: (2023)
Dynamics of Transient Structure in In-Context Linear Regression Transformers
by: Carroll, Liam, et al.
Published: (2025)
by: Carroll, Liam, et al.
Published: (2025)
Loss Landscape Degeneracy and Stagewise Development in Transformers
by: Hoogland, Jesse, et al.
Published: (2024)
by: Hoogland, Jesse, et al.
Published: (2024)
Stagewise Reinforcement Learning and the Geometry of the Regret Landscape
by: Elliott, Chris, et al.
Published: (2026)
by: Elliott, Chris, et al.
Published: (2026)
Compressibility Measures Complexity: Minimum Description Length Meets Singular Learning Theory
by: Urdshals, Einar, et al.
Published: (2025)
by: Urdshals, Einar, et al.
Published: (2025)
You Are What You Eat -- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation
by: Lehalleur, Simon Pepin, et al.
Published: (2025)
by: Lehalleur, Simon Pepin, et al.
Published: (2025)
Open Problems in Mechanistic Interpretability
by: Sharkey, Lee, et al.
Published: (2025)
by: Sharkey, Lee, et al.
Published: (2025)
Interpreting Learned Feedback Patterns in Large Language Models
by: Marks, Luke, et al.
Published: (2023)
by: Marks, Luke, et al.
Published: (2023)
Dual Interpretation of Machine Learning Forecasts
by: Coulombe, Philippe Goulet, et al.
Published: (2024)
by: Coulombe, Philippe Goulet, et al.
Published: (2024)
Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee
by: Chen, George H.
Published: (2022)
by: Chen, George H.
Published: (2022)
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
by: Makelov, Aleksandar, et al.
Published: (2024)
by: Makelov, Aleksandar, et al.
Published: (2024)
Learning to Explain: Supervised Token Attribution from Transformer Attention Patterns
by: Mihaila, George
Published: (2026)
by: Mihaila, George
Published: (2026)
Unveiling Global Interactive Patterns across Graphs: Towards Interpretable Graph Neural Networks
by: Wang, Yuwen, et al.
Published: (2024)
by: Wang, Yuwen, et al.
Published: (2024)
CardioPatternFormer: Pattern-Guided Attention for Interpretable ECG Classification with Transformer Architecture
by: Uğraş, Berat Kutay, et al.
Published: (2025)
by: Uğraş, Berat Kutay, et al.
Published: (2025)
ST-TGExplainer: Disentangling Stability and Transition Patterns for Temporal GNN Interpretability
by: Chen, Hongjiang, et al.
Published: (2026)
by: Chen, Hongjiang, et al.
Published: (2026)
Deep Kernel Aalen-Johansen Estimator: An Interpretable and Flexible Neural Net Framework for Competing Risks
by: Shen, Xiaobin, et al.
Published: (2025)
by: Shen, Xiaobin, et al.
Published: (2025)
Interpretable Dual-Stream Learning for Local Wind Hazard Prediction in Vulnerable Communities
by: Nishu, Mahmuda Akhter, et al.
Published: (2025)
by: Nishu, Mahmuda Akhter, et al.
Published: (2025)
Domain-Enhanced Dual-Branch Model for Efficient and Interpretable Accident Anticipation
by: Guan, Yanchen, et al.
Published: (2025)
by: Guan, Yanchen, et al.
Published: (2025)
Machine Intelligence on the Edge: Interpretable Cardiac Pattern Localisation Using Reinforcement Learning
by: Tian, Haozhe, et al.
Published: (2025)
by: Tian, Haozhe, et al.
Published: (2025)
Finding Patterns in Ambiguity: Interpretable Stress Testing in the Decision~Boundary
by: Gomes, Inês, et al.
Published: (2024)
by: Gomes, Inês, et al.
Published: (2024)
Pattern-Matching Dynamic Memory Network for Dual-Mode Traffic Prediction
by: Weng, Wenchao, et al.
Published: (2024)
by: Weng, Wenchao, et al.
Published: (2024)
DualKV: Shared-Prompt Flash Attention for Efficient RL Training with Large Rollouts and Long Contexts
by: Gai, Jiading, et al.
Published: (2026)
by: Gai, Jiading, et al.
Published: (2026)
Optimizing Interpretable Decision Tree Policies for Reinforcement Learning
by: Vos, Daniël, et al.
Published: (2024)
by: Vos, Daniël, et al.
Published: (2024)
RiemannONets: Interpretable Neural Operators for Riemann Problems
by: Peyvan, Ahmad, et al.
Published: (2024)
by: Peyvan, Ahmad, et al.
Published: (2024)
Towards Understanding Sensitive and Decisive Patterns in Explainable AI: A Case Study of Model Interpretation in Geometric Deep Learning
by: Zhu, Jiajun, et al.
Published: (2024)
by: Zhu, Jiajun, et al.
Published: (2024)
Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As Data
by: Johnson, Daniel D.
Published: (2024)
by: Johnson, Daniel D.
Published: (2024)
GIMLET: Generalizable and Interpretable Model Learning through Embedded Thermodynamics
by: Shiratori, Suguru, et al.
Published: (2025)
by: Shiratori, Suguru, et al.
Published: (2025)
The Dual-Stream Transformer: Channelized Architecture for Interpretable Language Modeling
by: Kerce, J. Clayton, et al.
Published: (2026)
by: Kerce, J. Clayton, et al.
Published: (2026)
Quantum Doeblin Coefficients: Interpretations and Applications
by: George, Ian, et al.
Published: (2025)
by: George, Ian, et al.
Published: (2025)
Diet-ODIN: A Novel Framework for Opioid Misuse Detection with Interpretable Dietary Patterns
by: Zhang, Zheyuan, et al.
Published: (2024)
by: Zhang, Zheyuan, et al.
Published: (2024)
Computational Design of Low-Volatility Lubricants for Space Using Interpretable Machine Learning
by: Miliate, Daniel, et al.
Published: (2025)
by: Miliate, Daniel, et al.
Published: (2025)
Similar Items
-
Structural Inference: Interpreting Small Language Models with Susceptibilities
by: Baker, Garrett, et al.
Published: (2025) -
Susceptibilities and Patterning: A Primer on Linear Response in Bayesian Learning
by: Elliott, Chris, et al.
Published: (2026) -
Interpreting Reinforcement Learning Agents with Susceptibilities
by: Elliott, Chris, et al.
Published: (2026) -
Embryology of a Language Model
by: Wang, George, et al.
Published: (2025) -
Modes of Sequence Models and Learning Coefficients
by: Chen, Zhongtian, et al.
Published: (2025)