:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gadgil, Soham, Lin, Chris, Lee, Su-In
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2505.16077
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Where to Steer: Input-Dependent Layer Selection for Steering Improves LLM Alignment
by: Gadgil, Soham, et al.
Published: (2026)

Estimating Conditional Mutual Information for Dynamic Feature Selection
by: Gadgil, Soham, et al.
Published: (2023)

SurrogateSHAP: Training-Free Contributor Attribution for Text-to-Image (T2I) Models
by: Lu, Mingyu, et al.
Published: (2026)

Data Alignment for Zero-Shot Concept Generation in Dermatology AI
by: Gadgil, Soham, et al.
Published: (2024)

Transformer-based Time-Series Biomarker Discovery for COPD Diagnosis
by: Gadgil, Soham, et al.
Published: (2024)

Classification for everyone : Building geography agnostic models for fairer recognition
by: Jindal, Akshat, et al.
Published: (2023)

Deep Learning Approach to Anomaly Detection in Enterprise ETL Processes with Autoencoders
by: Chen, Xin, et al.
Published: (2025)

Improving Sparse Autoencoder with Dynamic Attention
by: Wang, Dongsheng, et al.
Published: (2026)

Ensemble Visualization With Variational Autoencoder
by: Wu, Cenyang, et al.
Published: (2025)

Analysis of Variational Sparse Autoencoders
by: Baker, Zachary, et al.
Published: (2025)

Toward Identifiable Sparse Autoencoders
by: Nelson, Walter, et al.
Published: (2026)

Sparse Autoencoders, Again?
by: Lu, Yin, et al.
Published: (2025)

Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality
by: Lee, Sewoong, et al.
Published: (2025)

Transcoders Beat Sparse Autoencoders for Interpretability
by: Paulo, Gonçalo, et al.
Published: (2025)

Evaluating Sparse Autoencoders for Monosemantic Representation
by: Fereidouni, Moghis, et al.
Published: (2025)

Decomposing The Dark Matter of Sparse Autoencoders
by: Engels, Joshua, et al.
Published: (2024)

Disentangling Dense Embeddings with Sparse Autoencoders
by: O'Neill, Charles, et al.
Published: (2024)

Are Sparse Autoencoder Benchmarks Reliable?
by: Chanin, David
Published: (2026)

Attacker Behaviour Profiling using Stochastic Ensemble of Hidden Markov Models
by: Deshmukh, Soham, et al.
Published: (2019)

Sparse Autoencoders Do Not Find Canonical Units of Analysis
by: Leask, Patrick, et al.
Published: (2025)

Low-Rank Adapting Models for Sparse Autoencoders
by: Chen, Matthew, et al.
Published: (2025)

Attribution-Guided Distillation of Matryoshka Sparse Autoencoders
by: Martin-Linares, Cristina P., et al.
Published: (2025)

Interpretable Reward Model via Sparse Autoencoder
by: Zhang, Shuyi, et al.
Published: (2025)

Efficient Dictionary Learning with Switch Sparse Autoencoders
by: Mudide, Anish, et al.
Published: (2024)

Steering Language Model Refusal with Sparse Autoencoders
by: O'Brien, Kyle, et al.
Published: (2024)

Stable and Steerable Sparse Autoencoders with Weight Regularization
by: Jedryszek, Piotr, et al.
Published: (2026)

Interpreting Attention Layer Outputs with Sparse Autoencoders
by: Kissane, Connor, et al.
Published: (2024)

Sparse Autoencoder Decomposition of Clinical Sequence Model Representations: Feature Complexity, Task Specialisation, and Mortality Prediction
by: Sainsbury, Chris, et al.
Published: (2026)

BatchTopK Sparse Autoencoders
by: Bussmann, Bart, et al.
Published: (2024)

Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
by: Kantamneni, Subhash, et al.
Published: (2025)

Ensemble of Precision-Recall Curve (PRC) Classification Trees with Autoencoders
by: Miao, Jiaju, et al.
Published: (2025)

Sparse Autoencoders are Topic Models
by: Girrbach, Leander, et al.
Published: (2025)

A Meta-learning based Stacked Regression Approach for Customer Lifetime Value Prediction
by: Gadgil, Karan, et al.
Published: (2023)

Route Sparse Autoencoder to Interpret Large Language Models
by: Shi, Wei, et al.
Published: (2025)

Step-Level Sparse Autoencoder for Reasoning Process Interpretation
by: Yang, Xuan, et al.
Published: (2026)

Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
by: Makelov, Aleksandar, et al.
Published: (2024)

Behavioral Sequence Modeling with Ensemble Learning
by: Kawawa-Beaudan, Maxime, et al.
Published: (2024)

Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse Autoencoders
by: Ayonrinde, Kola
Published: (2024)

One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models
by: Surkov, Viacheslav, et al.
Published: (2024)

Empirical Evaluation of Progressive Coding for Sparse Autoencoders
by: Peter, Hans, et al.
Published: (2025)