:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Simon, Elana, Adams, Etowah, Zou, James
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2605.31518
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders
by: Simon, Elana, et al.
Published: (2024)

Visual Exploration of Feature Relationships in Sparse Autoencoders with Curated Concepts
by: Yan, Xinyuan, et al.
Published: (2025)

Exploring Urban Factors with Autoencoders: Relationship Between Static and Dynamic Features
by: Pocco, Ximena, et al.
Published: (2025)

Beyond Activation Patterns: A Weight-Based Out-of-Context Explanation of Sparse Autoencoder Features
by: Liu, Yiting, et al.
Published: (2026)

Sign-Aware Gated Sparse Autoencoders: Modeling Anticorrelated Features with Bi-Jump-ReLU Activations
by: Wieciech, Bartosz, et al.
Published: (2026)

Sparse Autoencoder Features for Classifications and Transferability
by: Gallifant, Jack, et al.
Published: (2025)

Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse Autoencoders
by: Ayonrinde, Kola
Published: (2024)

Do Sparse Autoencoders Identify Reasoning Features in Language Models?
by: Ma, George, et al.
Published: (2026)

Tree SAE: Learning Hierarchical Feature Structures in Sparse Autoencoders
by: Cao, Tue M., et al.
Published: (2026)

OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features
by: Korznikov, Anton, et al.
Published: (2025)

Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders
by: Marks, Luke, et al.
Published: (2024)

Sparse Autoencoders Trained on the Same Data Learn Different Features
by: Paulo, Gonçalo, et al.
Published: (2025)

PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding
by: Koromilas, Panagiotis, et al.
Published: (2026)

Feature Starvation as Geometric Instability in Sparse Autoencoders
by: Chaudhry, Faris, et al.
Published: (2026)

MoRFI: Monotonic Sparse Autoencoder Feature Identification
by: Dimakopoulos, Dimitris, et al.
Published: (2026)

The Geometry of Concepts: Sparse Autoencoder Feature Structure
by: Li, Yuxiao, et al.
Published: (2024)

Learning Multi-Level Features with Matryoshka Sparse Autoencoders
by: Bussmann, Bart, et al.
Published: (2025)

Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
by: Farnik, Lucy, et al.
Published: (2025)

Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders
by: Chanin, David, et al.
Published: (2025)

Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
by: Hindupur, Sai Sumedh R., et al.
Published: (2025)

Improving Steering Vectors by Targeting Sparse Autoencoder Features
by: Chalnev, Sviatoslav, et al.
Published: (2024)

Ensembling Sparse Autoencoders
by: Gadgil, Soham, et al.
Published: (2025)

Learning Interpretable Features in Audio Latent Spaces via Sparse Autoencoders
by: Paek, Nathan, et al.
Published: (2025)

Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
by: Chanin, David, et al.
Published: (2025)

Interpreting Outliers in Time Series Data through Decoding Autoencoder
by: Knab, Patrick, et al.
Published: (2024)

Approximate Algorithms For $k$-Sparse Wasserstein Barycenter With Outliers
by: Yang, Qingyuan, et al.
Published: (2024)

Deep Transductive Outlier Detection
by: Klüttermann, Simon, et al.
Published: (2024)

Model Unlearning via Sparse Autoencoder Subspace Guided Projections
by: Wang, Xu, et al.
Published: (2025)

Semantic Optimal Transport for Sparse Autoencoder Feature Matching and Circuit Compression
by: Cao, Tue M., et al.
Published: (2026)

MSS-PAE: Saving Autoencoder-based Outlier Detection from Unexpected Reconstruction
by: Tan, Xu, et al.
Published: (2023)

Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
by: Shu, Dong, et al.
Published: (2025)

AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
by: Zhu, Xudong, et al.
Published: (2025)

Toward Identifiable Sparse Autoencoders
by: Nelson, Walter, et al.
Published: (2026)

Analysis of Variational Sparse Autoencoders
by: Baker, Zachary, et al.
Published: (2025)

SAE-FD: Sparse Autoencoder Feature Distillation for Continual Learning of Large Language Models
by: Zhang, Mingxu, et al.
Published: (2026)

Sparse Autoencoders, Again?
by: Lu, Yin, et al.
Published: (2025)

Sparse Shift Autoencoders for Identifying Concepts from Large Language Model Activations
by: Joshi, Shruti, et al.
Published: (2025)

Feature Rivalry in Sparse Autoencoder Representations: A Mechanistic Study of Uncertainty-Driven Feature Competition in LLMs
by: Harshavardhan
Published: (2026)

Which Sparse Autoencoder Features Are Real? Model-X Knockoffs for False Discovery Rate Control
by: Enkhbayar, Tsogt-Ochir
Published: (2025)

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
by: Pach, Mateusz, et al.
Published: (2025)