Saved in:
| Main Authors: | Gauderis, Ward, Dooms, Thomas, Holmer, Steven T., Ayonrinde, Kola, Wiggins, Geraint A. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.08934 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Compositionality Unlocks Deep Interpretable Models
by: Dooms, Thomas, et al.
Published: (2025)
by: Dooms, Thomas, et al.
Published: (2025)
Bilinear autoencoders find interpretable manifolds
by: Dooms, Thomas, et al.
Published: (2026)
by: Dooms, Thomas, et al.
Published: (2026)
Finding Manifolds With Bilinear Autoencoders
by: Dooms, Thomas, et al.
Published: (2025)
by: Dooms, Thomas, et al.
Published: (2025)
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
by: Ayonrinde, Kola, et al.
Published: (2025)
by: Ayonrinde, Kola, et al.
Published: (2025)
Evaluating Explanations: An Explanatory Virtues Framework for Mechanistic Interpretability -- The Strange Science Part I.ii
by: Ayonrinde, Kola, et al.
Published: (2025)
by: Ayonrinde, Kola, et al.
Published: (2025)
Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse Autoencoders
by: Ayonrinde, Kola
Published: (2024)
by: Ayonrinde, Kola
Published: (2024)
Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs
by: Ayonrinde, Kola, et al.
Published: (2024)
by: Ayonrinde, Kola, et al.
Published: (2024)
Quantum Methods for Managing Ambiguity in Natural Language Processing
by: Eisinger, Jurek, et al.
Published: (2025)
by: Eisinger, Jurek, et al.
Published: (2025)
When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability
by: Gonzalez, ML Nissen, et al.
Published: (2026)
by: Gonzalez, ML Nissen, et al.
Published: (2026)
BioOSS: A Bio-Inspired Oscillatory State System with Spatio-Temporal Dynamics
by: Yuan, Zhongju, et al.
Published: (2025)
by: Yuan, Zhongju, et al.
Published: (2025)
Tokenized SAEs: Disentangling SAE Reconstructions
by: Dooms, Thomas, et al.
Published: (2025)
by: Dooms, Thomas, et al.
Published: (2025)
Towards a Formal Creativity Theory: Preliminary results in Novelty and Transformativeness
by: Santo, Luís Espírito, et al.
Published: (2024)
by: Santo, Luís Espírito, et al.
Published: (2024)
A novel Reservoir Architecture for Periodic Time Series Prediction
by: Yuan, Zhongju, et al.
Published: (2024)
by: Yuan, Zhongju, et al.
Published: (2024)
Fractals made Practical: Denoising Diffusion as Partitioned Iterated Function Systems
by: Dooms, Ann
Published: (2026)
by: Dooms, Ann
Published: (2026)
Weight-based Decomposition: A Case for Bilinear MLPs
by: Pearce, Michael T., et al.
Published: (2024)
by: Pearce, Michael T., et al.
Published: (2024)
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
by: Karvonen, Adam, et al.
Published: (2025)
by: Karvonen, Adam, et al.
Published: (2025)
Bilinear MLPs enable weight-based mechanistic interpretability
by: Pearce, Michael T., et al.
Published: (2024)
by: Pearce, Michael T., et al.
Published: (2024)
Exemplar Partitioning for Mechanistic Interpretability
by: Rumbelow, Jessica
Published: (2026)
by: Rumbelow, Jessica
Published: (2026)
Open Problems in Mechanistic Interpretability
by: Sharkey, Lee, et al.
Published: (2025)
by: Sharkey, Lee, et al.
Published: (2025)
Mechanistic Interpretability for Neural TSP Solvers
by: Narad, Reuben, et al.
Published: (2025)
by: Narad, Reuben, et al.
Published: (2025)
Mechanistic Interpretability of Reinforcement Learning Agents
by: Trim, Tristan, et al.
Published: (2024)
by: Trim, Tristan, et al.
Published: (2024)
Validating Mechanistic Interpretations: An Axiomatic Approach
by: Palumbo, Nils, et al.
Published: (2024)
by: Palumbo, Nils, et al.
Published: (2024)
Mechanistic Interpretability for Transformer-based Time Series Classification
by: Kalnāre, Matīss, et al.
Published: (2025)
by: Kalnāre, Matīss, et al.
Published: (2025)
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
by: Sutter, Denis, et al.
Published: (2025)
by: Sutter, Denis, et al.
Published: (2025)
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques
by: Gupta, Rohan, et al.
Published: (2024)
by: Gupta, Rohan, et al.
Published: (2024)
Neural Network-Based Piecewise Survival Models
by: Holmer, Olov, et al.
Published: (2024)
by: Holmer, Olov, et al.
Published: (2024)
Usage-Specific Survival Modeling Based on Operational Data and Neural Networks
by: Holmer, Olov, et al.
Published: (2024)
by: Holmer, Olov, et al.
Published: (2024)
Geospatial Mechanistic Interpretability of Large Language Models
by: De Sabbata, Stef, et al.
Published: (2025)
by: De Sabbata, Stef, et al.
Published: (2025)
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability
by: Bushnaq, Lucius, et al.
Published: (2024)
by: Bushnaq, Lucius, et al.
Published: (2024)
Challenges in Mechanistically Interpreting Model Representations
by: Golechha, Satvik, et al.
Published: (2024)
by: Golechha, Satvik, et al.
Published: (2024)
Mechanistic Interpretability of Binary and Ternary Transformers
by: Li, Jason
Published: (2024)
by: Li, Jason
Published: (2024)
Bridging Mechanistic Interpretability and Prompt Engineering with Gradient Ascent for Interpretable Persona Control
by: Saini, Harshvardhan, et al.
Published: (2026)
by: Saini, Harshvardhan, et al.
Published: (2026)
Compact Proofs of Model Performance via Mechanistic Interpretability
by: Gross, Jason, et al.
Published: (2024)
by: Gross, Jason, et al.
Published: (2024)
Mechanistic Interpretability of RNNs emulating Hidden Markov Models
by: Torre, Elia, et al.
Published: (2025)
by: Torre, Elia, et al.
Published: (2025)
Interpretable Deep Learning for Polar Mechanistic Reaction Prediction
by: Miller, Ryan J., et al.
Published: (2025)
by: Miller, Ryan J., et al.
Published: (2025)
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
by: Winninger, Thomas, et al.
Published: (2025)
by: Winninger, Thomas, et al.
Published: (2025)
MINAR: Mechanistic Interpretability for Neural Algorithmic Reasoning
by: He, Jesse, et al.
Published: (2026)
by: He, Jesse, et al.
Published: (2026)
Triangulation as an Acceptance Rule for Multilingual Mechanistic Interpretability
by: Long, Yanan
Published: (2025)
by: Long, Yanan
Published: (2025)
MIB: A Mechanistic Interpretability Benchmark
by: Mueller, Aaron, et al.
Published: (2025)
by: Mueller, Aaron, et al.
Published: (2025)
Group Equivariance Meets Mechanistic Interpretability: Equivariant Sparse Autoencoders
by: Erdogan, Ege, et al.
Published: (2025)
by: Erdogan, Ege, et al.
Published: (2025)
Similar Items
-
Compositionality Unlocks Deep Interpretable Models
by: Dooms, Thomas, et al.
Published: (2025) -
Bilinear autoencoders find interpretable manifolds
by: Dooms, Thomas, et al.
Published: (2026) -
Finding Manifolds With Bilinear Autoencoders
by: Dooms, Thomas, et al.
Published: (2025) -
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
by: Ayonrinde, Kola, et al.
Published: (2025) -
Evaluating Explanations: An Explanatory Virtues Framework for Mechanistic Interpretability -- The Strange Science Part I.ii
by: Ayonrinde, Kola, et al.
Published: (2025)