:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, T. Ed, Ren, Junyu
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2510.08855
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Sparse Autoencoder Features for Classifications and Transferability
by: Gallifant, Jack, et al.
Published: (2025)

Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders
by: Chanin, David, et al.
Published: (2025)

Improving Steering Vectors by Targeting Sparse Autoencoder Features
by: Chalnev, Sviatoslav, et al.
Published: (2024)

Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
by: Chanin, David, et al.
Published: (2025)

Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
by: Bhalla, Usha, et al.
Published: (2025)

Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models
by: Mishra, Anurag
Published: (2026)

AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
by: Zhu, Xudong, et al.
Published: (2025)

Diversity-driven Data Selection for Language Model Tuning through Sparse Autoencoder
by: Yang, Xianjun, et al.
Published: (2025)

Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders
by: Lan, Michael, et al.
Published: (2024)

FaithfulSAE: Towards Capturing Faithful Features with Sparse Autoencoders without External Dataset Dependencies
by: Cho, Seonglae, et al.
Published: (2025)

Evaluating Adversarial Robustness of Concept Representations in Sparse Autoencoders
by: Li, Aaron J., et al.
Published: (2025)

Sparse Autoencoder Decomposition of Clinical Sequence Model Representations: Feature Complexity, Task Specialisation, and Mortality Prediction
by: Sainsbury, Chris, et al.
Published: (2026)

Incorporating Hierarchical Semantics in Sparse Autoencoder Architectures
by: Muchane, Mark, et al.
Published: (2025)

Trainable Dynamic Mask Sparse Attention
by: Shi, Jingze, et al.
Published: (2025)

Training-Trajectory-Aware Token Selection
by: Shen, Zhanming, et al.
Published: (2026)

Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
by: Farnik, Lucy, et al.
Published: (2025)

SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks
by: Christopoulou, Fenia, et al.
Published: (2024)

SAEMark: Steering Personalized Multilingual LLM Watermarks with Sparse Autoencoders
by: Yu, Zhuohao, et al.
Published: (2025)

Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words
by: Minegishi, Gouki, et al.
Published: (2025)

Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation
by: Xu, Zihang, et al.
Published: (2026)

Towards Understanding the Robustness of Sparse Autoencoders
by: Saiyed, Ahson, et al.
Published: (2026)

Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders
by: Jing, Yi, et al.
Published: (2026)

Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
by: Shu, Dong, et al.
Published: (2025)

Steering LLMs? Actually, Sparse Autoencoders can outperform simple baselines
by: Jørgensen, Mikkel Godsk, et al.
Published: (2026)

Noise-Aware Training of Layout-Aware Language Models
by: Sarkhel, Ritesh, et al.
Published: (2024)

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
by: Huang, Tianjin, et al.
Published: (2025)

CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features
by: Cho, Seonglae, et al.
Published: (2025)

A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models
by: Shu, Dong, et al.
Published: (2025)

Sparse Shift Autoencoders for Identifying Concepts from Large Language Model Activations
by: Joshi, Shruti, et al.
Published: (2025)

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
by: Lieberum, Tom, et al.
Published: (2024)

DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
by: Wang, Xu, et al.
Published: (2026)

Build Your Personalized Research Group: A Multiagent Framework for Continual and Interactive Science Automation
by: Li, Ed, et al.
Published: (2025)

Adaptive-Boundary-Clipping GRPO: Ensuring Bounded Ratios for Stable and Generalizable Training
by: Liu, Chi, et al.
Published: (2026)

Does higher interpretability imply better utility? A Pairwise Analysis on Sparse Autoencoders
by: Wang, Xu, et al.
Published: (2025)

SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
by: He, Zirui, et al.
Published: (2025)

Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
by: Muhamed, Aashiq, et al.
Published: (2024)

Adaptive Pruning for Large Language Models with Structural Importance Awareness
by: Zheng, Haotian, et al.
Published: (2024)

SR-TTT: Surprisal-Aware Residual Test-Time Training
by: P, Swamynathan V
Published: (2026)

Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts
by: Peng, Kenny, et al.
Published: (2025)

Understanding and Accelerating the Training of Masked Diffusion Language Models
by: Hong, Chunsan, et al.
Published: (2026)