Saved in:
| Main Authors: | Simon, Elana, Adams, Etowah, Zou, James |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.31518 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders
by: Simon, Elana, et al.
Published: (2024)
by: Simon, Elana, et al.
Published: (2024)
Visual Exploration of Feature Relationships in Sparse Autoencoders with Curated Concepts
by: Yan, Xinyuan, et al.
Published: (2025)
by: Yan, Xinyuan, et al.
Published: (2025)
Exploring Urban Factors with Autoencoders: Relationship Between Static and Dynamic Features
by: Pocco, Ximena, et al.
Published: (2025)
by: Pocco, Ximena, et al.
Published: (2025)
Beyond Activation Patterns: A Weight-Based Out-of-Context Explanation of Sparse Autoencoder Features
by: Liu, Yiting, et al.
Published: (2026)
by: Liu, Yiting, et al.
Published: (2026)
Sign-Aware Gated Sparse Autoencoders: Modeling Anticorrelated Features with Bi-Jump-ReLU Activations
by: Wieciech, Bartosz, et al.
Published: (2026)
by: Wieciech, Bartosz, et al.
Published: (2026)
Sparse Autoencoder Features for Classifications and Transferability
by: Gallifant, Jack, et al.
Published: (2025)
by: Gallifant, Jack, et al.
Published: (2025)
Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse Autoencoders
by: Ayonrinde, Kola
Published: (2024)
by: Ayonrinde, Kola
Published: (2024)
Do Sparse Autoencoders Identify Reasoning Features in Language Models?
by: Ma, George, et al.
Published: (2026)
by: Ma, George, et al.
Published: (2026)
Tree SAE: Learning Hierarchical Feature Structures in Sparse Autoencoders
by: Cao, Tue M., et al.
Published: (2026)
by: Cao, Tue M., et al.
Published: (2026)
OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features
by: Korznikov, Anton, et al.
Published: (2025)
by: Korznikov, Anton, et al.
Published: (2025)
Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders
by: Marks, Luke, et al.
Published: (2024)
by: Marks, Luke, et al.
Published: (2024)
Sparse Autoencoders Trained on the Same Data Learn Different Features
by: Paulo, Gonçalo, et al.
Published: (2025)
by: Paulo, Gonçalo, et al.
Published: (2025)
PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding
by: Koromilas, Panagiotis, et al.
Published: (2026)
by: Koromilas, Panagiotis, et al.
Published: (2026)
Feature Starvation as Geometric Instability in Sparse Autoencoders
by: Chaudhry, Faris, et al.
Published: (2026)
by: Chaudhry, Faris, et al.
Published: (2026)
MoRFI: Monotonic Sparse Autoencoder Feature Identification
by: Dimakopoulos, Dimitris, et al.
Published: (2026)
by: Dimakopoulos, Dimitris, et al.
Published: (2026)
The Geometry of Concepts: Sparse Autoencoder Feature Structure
by: Li, Yuxiao, et al.
Published: (2024)
by: Li, Yuxiao, et al.
Published: (2024)
Learning Multi-Level Features with Matryoshka Sparse Autoencoders
by: Bussmann, Bart, et al.
Published: (2025)
by: Bussmann, Bart, et al.
Published: (2025)
Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
by: Farnik, Lucy, et al.
Published: (2025)
by: Farnik, Lucy, et al.
Published: (2025)
Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders
by: Chanin, David, et al.
Published: (2025)
by: Chanin, David, et al.
Published: (2025)
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
by: Hindupur, Sai Sumedh R., et al.
Published: (2025)
by: Hindupur, Sai Sumedh R., et al.
Published: (2025)
Improving Steering Vectors by Targeting Sparse Autoencoder Features
by: Chalnev, Sviatoslav, et al.
Published: (2024)
by: Chalnev, Sviatoslav, et al.
Published: (2024)
Ensembling Sparse Autoencoders
by: Gadgil, Soham, et al.
Published: (2025)
by: Gadgil, Soham, et al.
Published: (2025)
Learning Interpretable Features in Audio Latent Spaces via Sparse Autoencoders
by: Paek, Nathan, et al.
Published: (2025)
by: Paek, Nathan, et al.
Published: (2025)
Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
by: Chanin, David, et al.
Published: (2025)
by: Chanin, David, et al.
Published: (2025)
Interpreting Outliers in Time Series Data through Decoding Autoencoder
by: Knab, Patrick, et al.
Published: (2024)
by: Knab, Patrick, et al.
Published: (2024)
Approximate Algorithms For $k$-Sparse Wasserstein Barycenter With Outliers
by: Yang, Qingyuan, et al.
Published: (2024)
by: Yang, Qingyuan, et al.
Published: (2024)
Deep Transductive Outlier Detection
by: Klüttermann, Simon, et al.
Published: (2024)
by: Klüttermann, Simon, et al.
Published: (2024)
Model Unlearning via Sparse Autoencoder Subspace Guided Projections
by: Wang, Xu, et al.
Published: (2025)
by: Wang, Xu, et al.
Published: (2025)
Semantic Optimal Transport for Sparse Autoencoder Feature Matching and Circuit Compression
by: Cao, Tue M., et al.
Published: (2026)
by: Cao, Tue M., et al.
Published: (2026)
MSS-PAE: Saving Autoencoder-based Outlier Detection from Unexpected Reconstruction
by: Tan, Xu, et al.
Published: (2023)
by: Tan, Xu, et al.
Published: (2023)
Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
by: Zhu, Xudong, et al.
Published: (2025)
by: Zhu, Xudong, et al.
Published: (2025)
Toward Identifiable Sparse Autoencoders
by: Nelson, Walter, et al.
Published: (2026)
by: Nelson, Walter, et al.
Published: (2026)
Analysis of Variational Sparse Autoencoders
by: Baker, Zachary, et al.
Published: (2025)
by: Baker, Zachary, et al.
Published: (2025)
SAE-FD: Sparse Autoencoder Feature Distillation for Continual Learning of Large Language Models
by: Zhang, Mingxu, et al.
Published: (2026)
by: Zhang, Mingxu, et al.
Published: (2026)
Sparse Autoencoders, Again?
by: Lu, Yin, et al.
Published: (2025)
by: Lu, Yin, et al.
Published: (2025)
Sparse Shift Autoencoders for Identifying Concepts from Large Language Model Activations
by: Joshi, Shruti, et al.
Published: (2025)
by: Joshi, Shruti, et al.
Published: (2025)
Feature Rivalry in Sparse Autoencoder Representations: A Mechanistic Study of Uncertainty-Driven Feature Competition in LLMs
by: Harshavardhan
Published: (2026)
by: Harshavardhan
Published: (2026)
Which Sparse Autoencoder Features Are Real? Model-X Knockoffs for False Discovery Rate Control
by: Enkhbayar, Tsogt-Ochir
Published: (2025)
by: Enkhbayar, Tsogt-Ochir
Published: (2025)
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
by: Pach, Mateusz, et al.
Published: (2025)
by: Pach, Mateusz, et al.
Published: (2025)
Similar Items
-
InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders
by: Simon, Elana, et al.
Published: (2024) -
Visual Exploration of Feature Relationships in Sparse Autoencoders with Curated Concepts
by: Yan, Xinyuan, et al.
Published: (2025) -
Exploring Urban Factors with Autoencoders: Relationship Between Static and Dynamic Features
by: Pocco, Ximena, et al.
Published: (2025) -
Beyond Activation Patterns: A Weight-Based Out-of-Context Explanation of Sparse Autoencoder Features
by: Liu, Yiting, et al.
Published: (2026) -
Sign-Aware Gated Sparse Autoencoders: Modeling Anticorrelated Features with Bi-Jump-ReLU Activations
by: Wieciech, Bartosz, et al.
Published: (2026)