Saved in:
| Main Authors: | O'Neill, Charles, Ye, Christine, Iyer, Kartheik, Wu, John F. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.00657 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models
by: O'Neill, Charles, et al.
Published: (2024)
by: O'Neill, Charles, et al.
Published: (2024)
Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders
by: O'Neill, Charles, et al.
Published: (2024)
by: O'Neill, Charles, et al.
Published: (2024)
Resurrecting the Salmon: Rethinking Mechanistic Interpretability with Domain-Specific Sparse Autoencoders
by: O'Neill, Charles, et al.
Published: (2025)
by: O'Neill, Charles, et al.
Published: (2025)
Self-Attention as a Parametric Endofunctor: A Categorical Framework for Transformer Architectures
by: O'Neill, Charles
Published: (2025)
by: O'Neill, Charles
Published: (2025)
Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval
by: Park, Seongwan, et al.
Published: (2025)
by: Park, Seongwan, et al.
Published: (2025)
Sketching the Heat Kernel: Using Gaussian Processes to Embed Data
by: Gilbert, Anna C., et al.
Published: (2024)
by: Gilbert, Anna C., et al.
Published: (2024)
Type 2 Tobit Sample Selection Models with Bayesian Additive Regression Trees
by: O'Neill, Eoghan
Published: (2025)
by: O'Neill, Eoghan
Published: (2025)
Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity
by: Miller, Jack, et al.
Published: (2023)
by: Miller, Jack, et al.
Published: (2023)
Modelling the Doughnut of social and planetary boundaries with frugal machine learning
by: Vrizzi, Stefano, et al.
Published: (2025)
by: Vrizzi, Stefano, et al.
Published: (2025)
Measuring Sharpness in Grokking
by: Miller, Jack, et al.
Published: (2024)
by: Miller, Jack, et al.
Published: (2024)
From superposition to sparse codes: interpretable representations in neural networks
by: Klindt, David, et al.
Published: (2025)
by: Klindt, David, et al.
Published: (2025)
A Single Direction of Truth: An Observer Model's Linear Residual Probe Exposes and Steers Contextual Hallucinations
by: O'Neill, Charles, et al.
Published: (2025)
by: O'Neill, Charles, et al.
Published: (2025)
Re-envisioning Euclid Galaxy Morphology: Identifying and Interpreting Features with Sparse Autoencoders
by: Wu, John F., et al.
Published: (2025)
by: Wu, John F., et al.
Published: (2025)
CA-PCA: Manifold Dimension Estimation, Adapted for Curvature
by: Gilbert, Anna C., et al.
Published: (2023)
by: Gilbert, Anna C., et al.
Published: (2023)
Low-Rank Key Value Attention
by: O'Neill, James, et al.
Published: (2026)
by: O'Neill, James, et al.
Published: (2026)
Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit
by: Jiang, Nick, et al.
Published: (2025)
by: Jiang, Nick, et al.
Published: (2025)
Conceptualizing Embeddings: Sparse Disentanglement for Vision-Language Models
by: Kubaty, Piotr, et al.
Published: (2026)
by: Kubaty, Piotr, et al.
Published: (2026)
Ensembling Sparse Autoencoders
by: Gadgil, Soham, et al.
Published: (2025)
by: Gadgil, Soham, et al.
Published: (2025)
Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders
by: Ye, Mengyu, et al.
Published: (2025)
by: Ye, Mengyu, et al.
Published: (2025)
Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small
by: Chaudhary, Maheep, et al.
Published: (2024)
by: Chaudhary, Maheep, et al.
Published: (2024)
Steering Language Model Refusal with Sparse Autoencoders
by: O'Brien, Kyle, et al.
Published: (2024)
by: O'Brien, Kyle, et al.
Published: (2024)
Toward Identifiable Sparse Autoencoders
by: Nelson, Walter, et al.
Published: (2026)
by: Nelson, Walter, et al.
Published: (2026)
Analysis of Variational Sparse Autoencoders
by: Baker, Zachary, et al.
Published: (2025)
by: Baker, Zachary, et al.
Published: (2025)
Neighbor Embedding for High-Dimensional Sparse Poisson Data
by: Mudrik, Noga, et al.
Published: (2026)
by: Mudrik, Noga, et al.
Published: (2026)
Self-evolving Autoencoder Embedded Q-Network
by: Senthilnath, J., et al.
Published: (2024)
by: Senthilnath, J., et al.
Published: (2024)
Sparse Autoencoders, Again?
by: Lu, Yin, et al.
Published: (2025)
by: Lu, Yin, et al.
Published: (2025)
Training Superior Sparse Autoencoders for Instruct Models
by: Li, Jiaming, et al.
Published: (2025)
by: Li, Jiaming, et al.
Published: (2025)
Adversarial Disentanglement by Backpropagation with Physics-Informed Variational Autoencoder
by: Koune, Ioannis Christoforos, et al.
Published: (2025)
by: Koune, Ioannis Christoforos, et al.
Published: (2025)
Disentanglement of Sources in a Multi-Stream Variational Autoencoder
by: Boukun, Veranika, et al.
Published: (2025)
by: Boukun, Veranika, et al.
Published: (2025)
Disentangled Graph Autoencoder for Treatment Effect Estimation
by: Fan, Di, et al.
Published: (2024)
by: Fan, Di, et al.
Published: (2024)
Disentanglement with Factor Quantized Variational Autoencoders
by: Baykal, Gulcin, et al.
Published: (2024)
by: Baykal, Gulcin, et al.
Published: (2024)
Decomposing The Dark Matter of Sparse Autoencoders
by: Engels, Joshua, et al.
Published: (2024)
by: Engels, Joshua, et al.
Published: (2024)
Transcoders Beat Sparse Autoencoders for Interpretability
by: Paulo, Gonçalo, et al.
Published: (2025)
by: Paulo, Gonçalo, et al.
Published: (2025)
Evaluating Sparse Autoencoders for Monosemantic Representation
by: Fereidouni, Moghis, et al.
Published: (2025)
by: Fereidouni, Moghis, et al.
Published: (2025)
Are Sparse Autoencoder Benchmarks Reliable?
by: Chanin, David
Published: (2026)
by: Chanin, David
Published: (2026)
Dynamic Sparse Training of Diagonally Sparse Networks
by: Tyagi, Abhishek, et al.
Published: (2025)
by: Tyagi, Abhishek, et al.
Published: (2025)
Graph-Regularized Sparse Autoencoders for LLM Safety Steering
by: Yeon, Jehyeok, et al.
Published: (2025)
by: Yeon, Jehyeok, et al.
Published: (2025)
Evaluating Sources: Strategies for Faculty-Librarian-Student Collaboration.
by: Simmons-O'Neill, Elizabeth
Published: (1990)
by: Simmons-O'Neill, Elizabeth
Published: (1990)
Towards Interpretable Protein Structure Prediction with Sparse Autoencoders
by: Parsan, Nithin, et al.
Published: (2025)
by: Parsan, Nithin, et al.
Published: (2025)
Efficient Dictionary Learning with Switch Sparse Autoencoders
by: Mudide, Anish, et al.
Published: (2024)
by: Mudide, Anish, et al.
Published: (2024)
Similar Items
-
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models
by: O'Neill, Charles, et al.
Published: (2024) -
Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders
by: O'Neill, Charles, et al.
Published: (2024) -
Resurrecting the Salmon: Rethinking Mechanistic Interpretability with Domain-Specific Sparse Autoencoders
by: O'Neill, Charles, et al.
Published: (2025) -
Self-Attention as a Parametric Endofunctor: A Categorical Framework for Transformer Architectures
by: O'Neill, Charles
Published: (2025) -
Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval
by: Park, Seongwan, et al.
Published: (2025)