Saved in:
| Main Authors: | Li, T. Ed, Ren, Junyu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.08855 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Sparse Autoencoder Features for Classifications and Transferability
by: Gallifant, Jack, et al.
Published: (2025)
by: Gallifant, Jack, et al.
Published: (2025)
Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders
by: Chanin, David, et al.
Published: (2025)
by: Chanin, David, et al.
Published: (2025)
Improving Steering Vectors by Targeting Sparse Autoencoder Features
by: Chalnev, Sviatoslav, et al.
Published: (2024)
by: Chalnev, Sviatoslav, et al.
Published: (2024)
Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
by: Chanin, David, et al.
Published: (2025)
by: Chanin, David, et al.
Published: (2025)
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
by: Bhalla, Usha, et al.
Published: (2025)
by: Bhalla, Usha, et al.
Published: (2025)
Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models
by: Mishra, Anurag
Published: (2026)
by: Mishra, Anurag
Published: (2026)
AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
by: Zhu, Xudong, et al.
Published: (2025)
by: Zhu, Xudong, et al.
Published: (2025)
Diversity-driven Data Selection for Language Model Tuning through Sparse Autoencoder
by: Yang, Xianjun, et al.
Published: (2025)
by: Yang, Xianjun, et al.
Published: (2025)
Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders
by: Lan, Michael, et al.
Published: (2024)
by: Lan, Michael, et al.
Published: (2024)
FaithfulSAE: Towards Capturing Faithful Features with Sparse Autoencoders without External Dataset Dependencies
by: Cho, Seonglae, et al.
Published: (2025)
by: Cho, Seonglae, et al.
Published: (2025)
Evaluating Adversarial Robustness of Concept Representations in Sparse Autoencoders
by: Li, Aaron J., et al.
Published: (2025)
by: Li, Aaron J., et al.
Published: (2025)
Sparse Autoencoder Decomposition of Clinical Sequence Model Representations: Feature Complexity, Task Specialisation, and Mortality Prediction
by: Sainsbury, Chris, et al.
Published: (2026)
by: Sainsbury, Chris, et al.
Published: (2026)
Incorporating Hierarchical Semantics in Sparse Autoencoder Architectures
by: Muchane, Mark, et al.
Published: (2025)
by: Muchane, Mark, et al.
Published: (2025)
Trainable Dynamic Mask Sparse Attention
by: Shi, Jingze, et al.
Published: (2025)
by: Shi, Jingze, et al.
Published: (2025)
Training-Trajectory-Aware Token Selection
by: Shen, Zhanming, et al.
Published: (2026)
by: Shen, Zhanming, et al.
Published: (2026)
Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
by: Farnik, Lucy, et al.
Published: (2025)
by: Farnik, Lucy, et al.
Published: (2025)
SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks
by: Christopoulou, Fenia, et al.
Published: (2024)
by: Christopoulou, Fenia, et al.
Published: (2024)
SAEMark: Steering Personalized Multilingual LLM Watermarks with Sparse Autoencoders
by: Yu, Zhuohao, et al.
Published: (2025)
by: Yu, Zhuohao, et al.
Published: (2025)
Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words
by: Minegishi, Gouki, et al.
Published: (2025)
by: Minegishi, Gouki, et al.
Published: (2025)
Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation
by: Xu, Zihang, et al.
Published: (2026)
by: Xu, Zihang, et al.
Published: (2026)
Towards Understanding the Robustness of Sparse Autoencoders
by: Saiyed, Ahson, et al.
Published: (2026)
by: Saiyed, Ahson, et al.
Published: (2026)
Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders
by: Jing, Yi, et al.
Published: (2026)
by: Jing, Yi, et al.
Published: (2026)
Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
Steering LLMs? Actually, Sparse Autoencoders can outperform simple baselines
by: Jørgensen, Mikkel Godsk, et al.
Published: (2026)
by: Jørgensen, Mikkel Godsk, et al.
Published: (2026)
Noise-Aware Training of Layout-Aware Language Models
by: Sarkhel, Ritesh, et al.
Published: (2024)
by: Sarkhel, Ritesh, et al.
Published: (2024)
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
by: Huang, Tianjin, et al.
Published: (2025)
by: Huang, Tianjin, et al.
Published: (2025)
CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features
by: Cho, Seonglae, et al.
Published: (2025)
by: Cho, Seonglae, et al.
Published: (2025)
A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
Sparse Shift Autoencoders for Identifying Concepts from Large Language Model Activations
by: Joshi, Shruti, et al.
Published: (2025)
by: Joshi, Shruti, et al.
Published: (2025)
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
by: Lieberum, Tom, et al.
Published: (2024)
by: Lieberum, Tom, et al.
Published: (2024)
DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
by: Wang, Xu, et al.
Published: (2026)
by: Wang, Xu, et al.
Published: (2026)
Build Your Personalized Research Group: A Multiagent Framework for Continual and Interactive Science Automation
by: Li, Ed, et al.
Published: (2025)
by: Li, Ed, et al.
Published: (2025)
Adaptive-Boundary-Clipping GRPO: Ensuring Bounded Ratios for Stable and Generalizable Training
by: Liu, Chi, et al.
Published: (2026)
by: Liu, Chi, et al.
Published: (2026)
Does higher interpretability imply better utility? A Pairwise Analysis on Sparse Autoencoders
by: Wang, Xu, et al.
Published: (2025)
by: Wang, Xu, et al.
Published: (2025)
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
by: He, Zirui, et al.
Published: (2025)
by: He, Zirui, et al.
Published: (2025)
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
by: Muhamed, Aashiq, et al.
Published: (2024)
by: Muhamed, Aashiq, et al.
Published: (2024)
Adaptive Pruning for Large Language Models with Structural Importance Awareness
by: Zheng, Haotian, et al.
Published: (2024)
by: Zheng, Haotian, et al.
Published: (2024)
SR-TTT: Surprisal-Aware Residual Test-Time Training
by: P, Swamynathan V
Published: (2026)
by: P, Swamynathan V
Published: (2026)
Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts
by: Peng, Kenny, et al.
Published: (2025)
by: Peng, Kenny, et al.
Published: (2025)
Understanding and Accelerating the Training of Masked Diffusion Language Models
by: Hong, Chunsan, et al.
Published: (2026)
by: Hong, Chunsan, et al.
Published: (2026)
Similar Items
-
Sparse Autoencoder Features for Classifications and Transferability
by: Gallifant, Jack, et al.
Published: (2025) -
Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders
by: Chanin, David, et al.
Published: (2025) -
Improving Steering Vectors by Targeting Sparse Autoencoder Features
by: Chalnev, Sviatoslav, et al.
Published: (2024) -
Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
by: Chanin, David, et al.
Published: (2025) -
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
by: Bhalla, Usha, et al.
Published: (2025)