Saved in:
| Main Authors: | Yang, Xuan, Liu, Jiayu, Lai, Yuhang, Xu, Hao, Huang, Zhenya, Miao, Ning |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.03031 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Deep Thinking by Markov Chain of Continuous Thoughts
by: Liu, Jiayu, et al.
Published: (2025)
by: Liu, Jiayu, et al.
Published: (2025)
Towards Interpretable and Inference-Optimal COT Reasoning with Sparse Autoencoder-Guided Generation
by: Zhao, Daniel, et al.
Published: (2025)
by: Zhao, Daniel, et al.
Published: (2025)
Verifier-Backed Hard Problem Generation for Mathematical Reasoning
by: Lai, Yuhang, et al.
Published: (2026)
by: Lai, Yuhang, et al.
Published: (2026)
Transcoders Beat Sparse Autoencoders for Interpretability
by: Paulo, Gonçalo, et al.
Published: (2025)
by: Paulo, Gonçalo, et al.
Published: (2025)
Towards Interpretable Protein Structure Prediction with Sparse Autoencoders
by: Parsan, Nithin, et al.
Published: (2025)
by: Parsan, Nithin, et al.
Published: (2025)
Interpreting CFD Surrogates through Sparse Autoencoders
by: Hu, Yeping, et al.
Published: (2025)
by: Hu, Yeping, et al.
Published: (2025)
Interpretable Reward Model via Sparse Autoencoder
by: Zhang, Shuyi, et al.
Published: (2025)
by: Zhang, Shuyi, et al.
Published: (2025)
Interpreting Attention Layer Outputs with Sparse Autoencoders
by: Kissane, Connor, et al.
Published: (2024)
by: Kissane, Connor, et al.
Published: (2024)
Interpretable and Steerable Concept Bottleneck Sparse Autoencoders
by: Kulkarni, Akshay, et al.
Published: (2025)
by: Kulkarni, Akshay, et al.
Published: (2025)
Route Sparse Autoencoder to Interpret Large Language Models
by: Shi, Wei, et al.
Published: (2025)
by: Shi, Wei, et al.
Published: (2025)
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
by: Makelov, Aleksandar, et al.
Published: (2024)
by: Makelov, Aleksandar, et al.
Published: (2024)
Learning Interpretable Features in Audio Latent Spaces via Sparse Autoencoders
by: Paek, Nathan, et al.
Published: (2025)
by: Paek, Nathan, et al.
Published: (2025)
Interpretable Company Similarity with Sparse Autoencoders
by: Molinari, Marco, et al.
Published: (2024)
by: Molinari, Marco, et al.
Published: (2024)
Interpreting CLIP with Hierarchical Sparse Autoencoders
by: Zaigrajew, Vladimir, et al.
Published: (2025)
by: Zaigrajew, Vladimir, et al.
Published: (2025)
Group Equivariance Meets Mechanistic Interpretability: Equivariant Sparse Autoencoders
by: Erdogan, Ege, et al.
Published: (2025)
by: Erdogan, Ege, et al.
Published: (2025)
Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders
by: Marks, Luke, et al.
Published: (2024)
by: Marks, Luke, et al.
Published: (2024)
Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders
by: Ye, Mengyu, et al.
Published: (2025)
by: Ye, Mengyu, et al.
Published: (2025)
DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
by: Wang, Xu, et al.
Published: (2026)
by: Wang, Xu, et al.
Published: (2026)
Mechanistic Interpretability with Sparse Autoencoder Neural Operators
by: Tolooshams, Bahareh, et al.
Published: (2025)
by: Tolooshams, Bahareh, et al.
Published: (2025)
Kronecker Factorization Improves Efficiency and Interpretability of Sparse Autoencoders
by: Kurochkin, Vadim, et al.
Published: (2025)
by: Kurochkin, Vadim, et al.
Published: (2025)
Interpreting and Steering Protein Language Models through Sparse Autoencoders
by: Garcia, Edith Natalia Villegas, et al.
Published: (2025)
by: Garcia, Edith Natalia Villegas, et al.
Published: (2025)
Resurrecting the Salmon: Rethinking Mechanistic Interpretability with Domain-Specific Sparse Autoencoders
by: O'Neill, Charles, et al.
Published: (2025)
by: O'Neill, Charles, et al.
Published: (2025)
XNNTab -- Interpretable Neural Networks for Tabular Data using Sparse Autoencoders
by: Elhadri, Khawla, et al.
Published: (2025)
by: Elhadri, Khawla, et al.
Published: (2025)
SegCompass: Exploring Interpretable Alignment with Sparse Autoencoders for Enhanced Reasoning Segmentation
by: Lu, Zhenyu, et al.
Published: (2026)
by: Lu, Zhenyu, et al.
Published: (2026)
Learning Multi-Level Features with Matryoshka Sparse Autoencoders
by: Bussmann, Bart, et al.
Published: (2025)
by: Bussmann, Bart, et al.
Published: (2025)
Mechanistic Interpretability of Code Correctness in LLMs via Sparse Autoencoders
by: Tahimic, Kriz, et al.
Published: (2025)
by: Tahimic, Kriz, et al.
Published: (2025)
Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit
by: Jiang, Nick, et al.
Published: (2025)
by: Jiang, Nick, et al.
Published: (2025)
Sparse Autoencoders for Interpretable Medical Image Representation Learning
by: Wesp, Philipp, et al.
Published: (2026)
by: Wesp, Philipp, et al.
Published: (2026)
Do Sparse Autoencoders Identify Reasoning Features in Language Models?
by: Ma, George, et al.
Published: (2026)
by: Ma, George, et al.
Published: (2026)
AdaptiveK: Complexity-Driven Sparse Autoencoders for Interpretable Language Model Representations
by: Yao, Yifei, et al.
Published: (2025)
by: Yao, Yifei, et al.
Published: (2025)
Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval
by: Park, Seongwan, et al.
Published: (2025)
by: Park, Seongwan, et al.
Published: (2025)
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
by: Cywiński, Bartosz, et al.
Published: (2025)
by: Cywiński, Bartosz, et al.
Published: (2025)
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
by: Karvonen, Adam, et al.
Published: (2025)
by: Karvonen, Adam, et al.
Published: (2025)
Residualized Temporal Sparse Autoencoders for Interpreting Diffusion Models
by: Yeung, Calvin, et al.
Published: (2026)
by: Yeung, Calvin, et al.
Published: (2026)
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
by: Bhalla, Usha, et al.
Published: (2025)
by: Bhalla, Usha, et al.
Published: (2025)
Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control
by: Klenitskiy, Anton, et al.
Published: (2025)
by: Klenitskiy, Anton, et al.
Published: (2025)
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
by: Thasarathan, Harrish, et al.
Published: (2025)
by: Thasarathan, Harrish, et al.
Published: (2025)
Ensembling Sparse Autoencoders
by: Gadgil, Soham, et al.
Published: (2025)
by: Gadgil, Soham, et al.
Published: (2025)
Stabilizing Efficient Reasoning with Step-Level Advantage Selection
by: Wang, Han, et al.
Published: (2026)
by: Wang, Han, et al.
Published: (2026)
Linear Dynamics in the RLVR Training of Large Language Models
by: Wang, Tianle, et al.
Published: (2026)
by: Wang, Tianle, et al.
Published: (2026)
Similar Items
-
Deep Thinking by Markov Chain of Continuous Thoughts
by: Liu, Jiayu, et al.
Published: (2025) -
Towards Interpretable and Inference-Optimal COT Reasoning with Sparse Autoencoder-Guided Generation
by: Zhao, Daniel, et al.
Published: (2025) -
Verifier-Backed Hard Problem Generation for Mathematical Reasoning
by: Lai, Yuhang, et al.
Published: (2026) -
Transcoders Beat Sparse Autoencoders for Interpretability
by: Paulo, Gonçalo, et al.
Published: (2025) -
Towards Interpretable Protein Structure Prediction with Sparse Autoencoders
by: Parsan, Nithin, et al.
Published: (2025)