:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Xuan, Liu, Jiayu, Lai, Yuhang, Xu, Hao, Huang, Zhenya, Miao, Ning
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2603.03031
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Deep Thinking by Markov Chain of Continuous Thoughts
by: Liu, Jiayu, et al.
Published: (2025)

Towards Interpretable and Inference-Optimal COT Reasoning with Sparse Autoencoder-Guided Generation
by: Zhao, Daniel, et al.
Published: (2025)

Verifier-Backed Hard Problem Generation for Mathematical Reasoning
by: Lai, Yuhang, et al.
Published: (2026)

Transcoders Beat Sparse Autoencoders for Interpretability
by: Paulo, Gonçalo, et al.
Published: (2025)

Towards Interpretable Protein Structure Prediction with Sparse Autoencoders
by: Parsan, Nithin, et al.
Published: (2025)

Interpreting CFD Surrogates through Sparse Autoencoders
by: Hu, Yeping, et al.
Published: (2025)

Interpretable Reward Model via Sparse Autoencoder
by: Zhang, Shuyi, et al.
Published: (2025)

Interpreting Attention Layer Outputs with Sparse Autoencoders
by: Kissane, Connor, et al.
Published: (2024)

Interpretable and Steerable Concept Bottleneck Sparse Autoencoders
by: Kulkarni, Akshay, et al.
Published: (2025)

Route Sparse Autoencoder to Interpret Large Language Models
by: Shi, Wei, et al.
Published: (2025)

Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
by: Makelov, Aleksandar, et al.
Published: (2024)

Learning Interpretable Features in Audio Latent Spaces via Sparse Autoencoders
by: Paek, Nathan, et al.
Published: (2025)

Interpretable Company Similarity with Sparse Autoencoders
by: Molinari, Marco, et al.
Published: (2024)

Interpreting CLIP with Hierarchical Sparse Autoencoders
by: Zaigrajew, Vladimir, et al.
Published: (2025)

Group Equivariance Meets Mechanistic Interpretability: Equivariant Sparse Autoencoders
by: Erdogan, Ege, et al.
Published: (2025)

Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders
by: Marks, Luke, et al.
Published: (2024)

Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders
by: Ye, Mengyu, et al.
Published: (2025)

DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
by: Wang, Xu, et al.
Published: (2026)

Mechanistic Interpretability with Sparse Autoencoder Neural Operators
by: Tolooshams, Bahareh, et al.
Published: (2025)

Kronecker Factorization Improves Efficiency and Interpretability of Sparse Autoencoders
by: Kurochkin, Vadim, et al.
Published: (2025)

Interpreting and Steering Protein Language Models through Sparse Autoencoders
by: Garcia, Edith Natalia Villegas, et al.
Published: (2025)

Resurrecting the Salmon: Rethinking Mechanistic Interpretability with Domain-Specific Sparse Autoencoders
by: O'Neill, Charles, et al.
Published: (2025)

XNNTab -- Interpretable Neural Networks for Tabular Data using Sparse Autoencoders
by: Elhadri, Khawla, et al.
Published: (2025)

SegCompass: Exploring Interpretable Alignment with Sparse Autoencoders for Enhanced Reasoning Segmentation
by: Lu, Zhenyu, et al.
Published: (2026)

Learning Multi-Level Features with Matryoshka Sparse Autoencoders
by: Bussmann, Bart, et al.
Published: (2025)

Mechanistic Interpretability of Code Correctness in LLMs via Sparse Autoencoders
by: Tahimic, Kriz, et al.
Published: (2025)

Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit
by: Jiang, Nick, et al.
Published: (2025)

Sparse Autoencoders for Interpretable Medical Image Representation Learning
by: Wesp, Philipp, et al.
Published: (2026)

Do Sparse Autoencoders Identify Reasoning Features in Language Models?
by: Ma, George, et al.
Published: (2026)

AdaptiveK: Complexity-Driven Sparse Autoencoders for Interpretable Language Model Representations
by: Yao, Yifei, et al.
Published: (2025)

Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval
by: Park, Seongwan, et al.
Published: (2025)

SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
by: Cywiński, Bartosz, et al.
Published: (2025)

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
by: Karvonen, Adam, et al.
Published: (2025)

Residualized Temporal Sparse Autoencoders for Interpreting Diffusion Models
by: Yeung, Calvin, et al.
Published: (2026)

Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
by: Bhalla, Usha, et al.
Published: (2025)

Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control
by: Klenitskiy, Anton, et al.
Published: (2025)

Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
by: Thasarathan, Harrish, et al.
Published: (2025)

Ensembling Sparse Autoencoders
by: Gadgil, Soham, et al.
Published: (2025)

Stabilizing Efficient Reasoning with Step-Level Advantage Selection
by: Wang, Han, et al.
Published: (2026)

Linear Dynamics in the RLVR Training of Large Language Models
by: Wang, Tianle, et al.
Published: (2026)