:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Luo, Yifan, Zhan, Yang, Jiang, Jiedong, Liu, Tianyang, Wu, Mingrui, Zhou, Zhennan, Dong, Bin
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.11881
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

InverseScope: Scalable Activation Inversion for Interpreting Large Language Models
by: Luo, Yifan, et al.
Published: (2025)

Jailbreak Instruction-Tuned LLMs via end-of-sentence MLP Re-weighting
by: Luo, Yifan, et al.
Published: (2024)

The Geometry of Concepts: Sparse Autoencoder Feature Structure
by: Li, Yuxiao, et al.
Published: (2024)

Measuring Sparse Autoencoder Feature Sensitivity
by: Tian, Claire, et al.
Published: (2025)

Graph-Regularized Sparse Autoencoders for LLM Safety Steering
by: Yeon, Jehyeok, et al.
Published: (2025)

Interpreting CLIP with Hierarchical Sparse Autoencoders
by: Zaigrajew, Vladimir, et al.
Published: (2025)

Tree SAE: Learning Hierarchical Feature Structures in Sparse Autoencoders
by: Cao, Tue M., et al.
Published: (2026)

Sparse Autoencoder Features for Classifications and Transferability
by: Gallifant, Jack, et al.
Published: (2025)

Incorporating Hierarchical Semantics in Sparse Autoencoder Architectures
by: Muchane, Mark, et al.
Published: (2025)

Towards Interpretable Protein Structure Prediction with Sparse Autoencoders
by: Parsan, Nithin, et al.
Published: (2025)

From Token Lists to Graph Motifs: Weisfeiler-Lehman Analysis of Sparse Autoencoder Features
by: Fernandez-Boullon, Ruben, et al.
Published: (2026)

Feature Starvation as Geometric Instability in Sparse Autoencoders
by: Chaudhry, Faris, et al.
Published: (2026)

Causal Interpretation of Sparse Autoencoder Features in Vision
by: Han, Sangyu, et al.
Published: (2025)

Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse Autoencoders
by: Ayonrinde, Kola
Published: (2024)

Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders
by: Chanin, David, et al.
Published: (2025)

Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement
by: Wang, Anyi, et al.
Published: (2025)

Learning Multi-Level Features with Matryoshka Sparse Autoencoders
by: Bussmann, Bart, et al.
Published: (2025)

Domain-Filtered Knowledge Graphs from Sparse Autoencoder Features
by: Winnicki, John, et al.
Published: (2026)

LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving
by: Gao, Guoxiong, et al.
Published: (2026)

Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
by: Shu, Dong, et al.
Published: (2025)

Improving Steering Vectors by Targeting Sparse Autoencoder Features
by: Chalnev, Sviatoslav, et al.
Published: (2024)

Autoencoding Random Forests
by: Vu, Binh Duc, et al.
Published: (2025)

Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
by: Chanin, David, et al.
Published: (2025)

A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
by: Chanin, David, et al.
Published: (2024)

Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders
by: Chen, Siyu, et al.
Published: (2025)

Sparse Autoencoder Decomposition of Clinical Sequence Model Representations: Feature Complexity, Task Specialisation, and Mortality Prediction
by: Sainsbury, Chris, et al.
Published: (2026)

AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
by: Zhu, Xudong, et al.
Published: (2025)

Semantic Optimal Transport for Sparse Autoencoder Feature Matching and Circuit Compression
by: Cao, Tue M., et al.
Published: (2026)

Beyond Self-Play: Hierarchical Reasoning for Continuous Motion in Closed-Loop Traffic Simulation
by: Zhang, Weifan, et al.
Published: (2026)

Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering
by: Zhao, Haiyan, et al.
Published: (2025)

Herald: A Natural Language Annotated Lean 4 Dataset
by: Gao, Guoxiong, et al.
Published: (2024)

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
by: Pach, Mateusz, et al.
Published: (2025)

Constrain Alignment with Sparse Autoencoders
by: Yin, Qingyu, et al.
Published: (2024)

A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models
by: Shu, Dong, et al.
Published: (2025)

Sparse Autoencoders, Again?
by: Lu, Yin, et al.
Published: (2025)

CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features
by: Cho, Seonglae, et al.
Published: (2025)

SparseRM: A Lightweight Preference Modeling with Sparse Autoencoder
by: Liu, Dengcan, et al.
Published: (2025)

Control Reinforcement Learning: Interpretable Token-Level Steering of LLMs via Sparse Autoencoder Features
by: Cho, Seonglae, et al.
Published: (2026)

Are Sparse Autoencoder Benchmarks Reliable?
by: Chanin, David
Published: (2026)

Rethinking Sparse Autoencoders: Select-and-Project for Fairness and Control from Encoder Features Alone
by: Bărbălau, Antonio, et al.
Published: (2025)