Saved in:
| Main Authors: | Gadgil, Soham, Lin, Chris, Lee, Su-In |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.16077 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Where to Steer: Input-Dependent Layer Selection for Steering Improves LLM Alignment
by: Gadgil, Soham, et al.
Published: (2026)
by: Gadgil, Soham, et al.
Published: (2026)
Estimating Conditional Mutual Information for Dynamic Feature Selection
by: Gadgil, Soham, et al.
Published: (2023)
by: Gadgil, Soham, et al.
Published: (2023)
SurrogateSHAP: Training-Free Contributor Attribution for Text-to-Image (T2I) Models
by: Lu, Mingyu, et al.
Published: (2026)
by: Lu, Mingyu, et al.
Published: (2026)
Data Alignment for Zero-Shot Concept Generation in Dermatology AI
by: Gadgil, Soham, et al.
Published: (2024)
by: Gadgil, Soham, et al.
Published: (2024)
Transformer-based Time-Series Biomarker Discovery for COPD Diagnosis
by: Gadgil, Soham, et al.
Published: (2024)
by: Gadgil, Soham, et al.
Published: (2024)
Classification for everyone : Building geography agnostic models for fairer recognition
by: Jindal, Akshat, et al.
Published: (2023)
by: Jindal, Akshat, et al.
Published: (2023)
Deep Learning Approach to Anomaly Detection in Enterprise ETL Processes with Autoencoders
by: Chen, Xin, et al.
Published: (2025)
by: Chen, Xin, et al.
Published: (2025)
Improving Sparse Autoencoder with Dynamic Attention
by: Wang, Dongsheng, et al.
Published: (2026)
by: Wang, Dongsheng, et al.
Published: (2026)
Ensemble Visualization With Variational Autoencoder
by: Wu, Cenyang, et al.
Published: (2025)
by: Wu, Cenyang, et al.
Published: (2025)
Analysis of Variational Sparse Autoencoders
by: Baker, Zachary, et al.
Published: (2025)
by: Baker, Zachary, et al.
Published: (2025)
Toward Identifiable Sparse Autoencoders
by: Nelson, Walter, et al.
Published: (2026)
by: Nelson, Walter, et al.
Published: (2026)
Sparse Autoencoders, Again?
by: Lu, Yin, et al.
Published: (2025)
by: Lu, Yin, et al.
Published: (2025)
Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality
by: Lee, Sewoong, et al.
Published: (2025)
by: Lee, Sewoong, et al.
Published: (2025)
Transcoders Beat Sparse Autoencoders for Interpretability
by: Paulo, Gonçalo, et al.
Published: (2025)
by: Paulo, Gonçalo, et al.
Published: (2025)
Evaluating Sparse Autoencoders for Monosemantic Representation
by: Fereidouni, Moghis, et al.
Published: (2025)
by: Fereidouni, Moghis, et al.
Published: (2025)
Decomposing The Dark Matter of Sparse Autoencoders
by: Engels, Joshua, et al.
Published: (2024)
by: Engels, Joshua, et al.
Published: (2024)
Disentangling Dense Embeddings with Sparse Autoencoders
by: O'Neill, Charles, et al.
Published: (2024)
by: O'Neill, Charles, et al.
Published: (2024)
Are Sparse Autoencoder Benchmarks Reliable?
by: Chanin, David
Published: (2026)
by: Chanin, David
Published: (2026)
Attacker Behaviour Profiling using Stochastic Ensemble of Hidden Markov Models
by: Deshmukh, Soham, et al.
Published: (2019)
by: Deshmukh, Soham, et al.
Published: (2019)
Sparse Autoencoders Do Not Find Canonical Units of Analysis
by: Leask, Patrick, et al.
Published: (2025)
by: Leask, Patrick, et al.
Published: (2025)
Low-Rank Adapting Models for Sparse Autoencoders
by: Chen, Matthew, et al.
Published: (2025)
by: Chen, Matthew, et al.
Published: (2025)
Attribution-Guided Distillation of Matryoshka Sparse Autoencoders
by: Martin-Linares, Cristina P., et al.
Published: (2025)
by: Martin-Linares, Cristina P., et al.
Published: (2025)
Interpretable Reward Model via Sparse Autoencoder
by: Zhang, Shuyi, et al.
Published: (2025)
by: Zhang, Shuyi, et al.
Published: (2025)
Efficient Dictionary Learning with Switch Sparse Autoencoders
by: Mudide, Anish, et al.
Published: (2024)
by: Mudide, Anish, et al.
Published: (2024)
Steering Language Model Refusal with Sparse Autoencoders
by: O'Brien, Kyle, et al.
Published: (2024)
by: O'Brien, Kyle, et al.
Published: (2024)
Stable and Steerable Sparse Autoencoders with Weight Regularization
by: Jedryszek, Piotr, et al.
Published: (2026)
by: Jedryszek, Piotr, et al.
Published: (2026)
Interpreting Attention Layer Outputs with Sparse Autoencoders
by: Kissane, Connor, et al.
Published: (2024)
by: Kissane, Connor, et al.
Published: (2024)
Sparse Autoencoder Decomposition of Clinical Sequence Model Representations: Feature Complexity, Task Specialisation, and Mortality Prediction
by: Sainsbury, Chris, et al.
Published: (2026)
by: Sainsbury, Chris, et al.
Published: (2026)
BatchTopK Sparse Autoencoders
by: Bussmann, Bart, et al.
Published: (2024)
by: Bussmann, Bart, et al.
Published: (2024)
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
by: Kantamneni, Subhash, et al.
Published: (2025)
by: Kantamneni, Subhash, et al.
Published: (2025)
Ensemble of Precision-Recall Curve (PRC) Classification Trees with Autoencoders
by: Miao, Jiaju, et al.
Published: (2025)
by: Miao, Jiaju, et al.
Published: (2025)
Sparse Autoencoders are Topic Models
by: Girrbach, Leander, et al.
Published: (2025)
by: Girrbach, Leander, et al.
Published: (2025)
A Meta-learning based Stacked Regression Approach for Customer Lifetime Value Prediction
by: Gadgil, Karan, et al.
Published: (2023)
by: Gadgil, Karan, et al.
Published: (2023)
Route Sparse Autoencoder to Interpret Large Language Models
by: Shi, Wei, et al.
Published: (2025)
by: Shi, Wei, et al.
Published: (2025)
Step-Level Sparse Autoencoder for Reasoning Process Interpretation
by: Yang, Xuan, et al.
Published: (2026)
by: Yang, Xuan, et al.
Published: (2026)
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
by: Makelov, Aleksandar, et al.
Published: (2024)
by: Makelov, Aleksandar, et al.
Published: (2024)
Behavioral Sequence Modeling with Ensemble Learning
by: Kawawa-Beaudan, Maxime, et al.
Published: (2024)
by: Kawawa-Beaudan, Maxime, et al.
Published: (2024)
Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse Autoencoders
by: Ayonrinde, Kola
Published: (2024)
by: Ayonrinde, Kola
Published: (2024)
One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models
by: Surkov, Viacheslav, et al.
Published: (2024)
by: Surkov, Viacheslav, et al.
Published: (2024)
Empirical Evaluation of Progressive Coding for Sparse Autoencoders
by: Peter, Hans, et al.
Published: (2025)
by: Peter, Hans, et al.
Published: (2025)
Similar Items
-
Where to Steer: Input-Dependent Layer Selection for Steering Improves LLM Alignment
by: Gadgil, Soham, et al.
Published: (2026) -
Estimating Conditional Mutual Information for Dynamic Feature Selection
by: Gadgil, Soham, et al.
Published: (2023) -
SurrogateSHAP: Training-Free Contributor Attribution for Text-to-Image (T2I) Models
by: Lu, Mingyu, et al.
Published: (2026) -
Data Alignment for Zero-Shot Concept Generation in Dermatology AI
by: Gadgil, Soham, et al.
Published: (2024) -
Transformer-based Time-Series Biomarker Discovery for COPD Diagnosis
by: Gadgil, Soham, et al.
Published: (2024)