:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	O'Neill, Charles, Ye, Christine, Iyer, Kartheik, Wu, John F.
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2408.00657
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models
by: O'Neill, Charles, et al.
Published: (2024)

Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders
by: O'Neill, Charles, et al.
Published: (2024)

Resurrecting the Salmon: Rethinking Mechanistic Interpretability with Domain-Specific Sparse Autoencoders
by: O'Neill, Charles, et al.
Published: (2025)

Self-Attention as a Parametric Endofunctor: A Categorical Framework for Transformer Architectures
by: O'Neill, Charles
Published: (2025)

Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval
by: Park, Seongwan, et al.
Published: (2025)

Sketching the Heat Kernel: Using Gaussian Processes to Embed Data
by: Gilbert, Anna C., et al.
Published: (2024)

Type 2 Tobit Sample Selection Models with Bayesian Additive Regression Trees
by: O'Neill, Eoghan
Published: (2025)

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity
by: Miller, Jack, et al.
Published: (2023)

Modelling the Doughnut of social and planetary boundaries with frugal machine learning
by: Vrizzi, Stefano, et al.
Published: (2025)

Measuring Sharpness in Grokking
by: Miller, Jack, et al.
Published: (2024)

From superposition to sparse codes: interpretable representations in neural networks
by: Klindt, David, et al.
Published: (2025)

A Single Direction of Truth: An Observer Model's Linear Residual Probe Exposes and Steers Contextual Hallucinations
by: O'Neill, Charles, et al.
Published: (2025)

Re-envisioning Euclid Galaxy Morphology: Identifying and Interpreting Features with Sparse Autoencoders
by: Wu, John F., et al.
Published: (2025)

CA-PCA: Manifold Dimension Estimation, Adapted for Curvature
by: Gilbert, Anna C., et al.
Published: (2023)

Low-Rank Key Value Attention
by: O'Neill, James, et al.
Published: (2026)

Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit
by: Jiang, Nick, et al.
Published: (2025)

Conceptualizing Embeddings: Sparse Disentanglement for Vision-Language Models
by: Kubaty, Piotr, et al.
Published: (2026)

Ensembling Sparse Autoencoders
by: Gadgil, Soham, et al.
Published: (2025)

Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders
by: Ye, Mengyu, et al.
Published: (2025)

Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small
by: Chaudhary, Maheep, et al.
Published: (2024)

Steering Language Model Refusal with Sparse Autoencoders
by: O'Brien, Kyle, et al.
Published: (2024)

Toward Identifiable Sparse Autoencoders
by: Nelson, Walter, et al.
Published: (2026)

Analysis of Variational Sparse Autoencoders
by: Baker, Zachary, et al.
Published: (2025)

Neighbor Embedding for High-Dimensional Sparse Poisson Data
by: Mudrik, Noga, et al.
Published: (2026)

Self-evolving Autoencoder Embedded Q-Network
by: Senthilnath, J., et al.
Published: (2024)

Sparse Autoencoders, Again?
by: Lu, Yin, et al.
Published: (2025)

Training Superior Sparse Autoencoders for Instruct Models
by: Li, Jiaming, et al.
Published: (2025)

Adversarial Disentanglement by Backpropagation with Physics-Informed Variational Autoencoder
by: Koune, Ioannis Christoforos, et al.
Published: (2025)

Disentanglement of Sources in a Multi-Stream Variational Autoencoder
by: Boukun, Veranika, et al.
Published: (2025)

Disentangled Graph Autoencoder for Treatment Effect Estimation
by: Fan, Di, et al.
Published: (2024)

Disentanglement with Factor Quantized Variational Autoencoders
by: Baykal, Gulcin, et al.
Published: (2024)

Decomposing The Dark Matter of Sparse Autoencoders
by: Engels, Joshua, et al.
Published: (2024)

Transcoders Beat Sparse Autoencoders for Interpretability
by: Paulo, Gonçalo, et al.
Published: (2025)

Evaluating Sparse Autoencoders for Monosemantic Representation
by: Fereidouni, Moghis, et al.
Published: (2025)

Are Sparse Autoencoder Benchmarks Reliable?
by: Chanin, David
Published: (2026)

Dynamic Sparse Training of Diagonally Sparse Networks
by: Tyagi, Abhishek, et al.
Published: (2025)

Graph-Regularized Sparse Autoencoders for LLM Safety Steering
by: Yeon, Jehyeok, et al.
Published: (2025)

Evaluating Sources: Strategies for Faculty-Librarian-Student Collaboration.
by: Simmons-O'Neill, Elizabeth
Published: (1990)

Towards Interpretable Protein Structure Prediction with Sparse Autoencoders
by: Parsan, Nithin, et al.
Published: (2025)

Efficient Dictionary Learning with Switch Sparse Autoencoders
by: Mudide, Anish, et al.
Published: (2024)