Saved in:
| Main Author: | Poupart, Yoann |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.04028 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TDHook: A Lightweight Framework for Interpretability
by: Poupart, Yoann
Published: (2025)
by: Poupart, Yoann
Published: (2025)
Perspectives for Direct Interpretability in Multi-Agent Deep Reinforcement Learning
by: Poupart, Yoann, et al.
Published: (2025)
by: Poupart, Yoann, et al.
Published: (2025)
Iterative Inference in a Chess-Playing Neural Network
by: Sandmann, Elias, et al.
Published: (2025)
by: Sandmann, Elias, et al.
Published: (2025)
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
by: Jenner, Erik, et al.
Published: (2024)
by: Jenner, Erik, et al.
Published: (2024)
Interpreting CLIP with Hierarchical Sparse Autoencoders
by: Zaigrajew, Vladimir, et al.
Published: (2025)
by: Zaigrajew, Vladimir, et al.
Published: (2025)
Mechanistic Interpretability with Sparse Autoencoder Neural Operators
by: Tolooshams, Bahareh, et al.
Published: (2025)
by: Tolooshams, Bahareh, et al.
Published: (2025)
Causal Interpretation of Sparse Autoencoder Features in Vision
by: Han, Sangyu, et al.
Published: (2025)
by: Han, Sangyu, et al.
Published: (2025)
Neural Network-based Information Set Weighting for Playing Reconnaissance Blind Chess
by: Bertram, Timo, et al.
Published: (2024)
by: Bertram, Timo, et al.
Published: (2024)
Mixture of Masters: Sparse Chess Language Models with Player Routing
by: Frisoni, Giacomo, et al.
Published: (2026)
by: Frisoni, Giacomo, et al.
Published: (2026)
Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens
by: Jeong, Jihwan, et al.
Published: (2025)
by: Jeong, Jihwan, et al.
Published: (2025)
UniMaia: Steering Chess Policies with Language for Human-like Play
by: Siu, Sherman, et al.
Published: (2026)
by: Siu, Sherman, et al.
Published: (2026)
Towards Interpretable Protein Structure Prediction with Sparse Autoencoders
by: Parsan, Nithin, et al.
Published: (2025)
by: Parsan, Nithin, et al.
Published: (2025)
Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit
by: Jiang, Nick, et al.
Published: (2025)
by: Jiang, Nick, et al.
Published: (2025)
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
by: Cywiński, Bartosz, et al.
Published: (2025)
by: Cywiński, Bartosz, et al.
Published: (2025)
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
by: Bhalla, Usha, et al.
Published: (2025)
by: Bhalla, Usha, et al.
Published: (2025)
Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control
by: Klenitskiy, Anton, et al.
Published: (2025)
by: Klenitskiy, Anton, et al.
Published: (2025)
Residualized Temporal Sparse Autoencoders for Interpreting Diffusion Models
by: Yeung, Calvin, et al.
Published: (2026)
by: Yeung, Calvin, et al.
Published: (2026)
ChessQA: Evaluating Large Language Models for Chess Understanding
by: Wen, Qianfeng, et al.
Published: (2025)
by: Wen, Qianfeng, et al.
Published: (2025)
Amortized Planning with Large-Scale Transformers: A Case Study on Chess
by: Ruoss, Anian, et al.
Published: (2024)
by: Ruoss, Anian, et al.
Published: (2024)
Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs
by: Wang, Hao, et al.
Published: (2026)
by: Wang, Hao, et al.
Published: (2026)
LouvreSAE: Sparse Autoencoders for Interpretable and Controllable Style Transfer
by: Panda, Raina, et al.
Published: (2025)
by: Panda, Raina, et al.
Published: (2025)
Towards Interpretable and Inference-Optimal COT Reasoning with Sparse Autoencoder-Guided Generation
by: Zhao, Daniel, et al.
Published: (2025)
by: Zhao, Daniel, et al.
Published: (2025)
Why Online Reinforcement Learning is Causal
by: Schulte, Oliver, et al.
Published: (2024)
by: Schulte, Oliver, et al.
Published: (2024)
Sparse Autoencoders, Again?
by: Lu, Yin, et al.
Published: (2025)
by: Lu, Yin, et al.
Published: (2025)
IRDS: Interpretable RLVR Data Selection via Verifier-Coupled Sparse Autoencoder Coverage
by: Li, Yuhan, et al.
Published: (2026)
by: Li, Yuhan, et al.
Published: (2026)
Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy
by: Balagansky, Nikita, et al.
Published: (2025)
by: Balagansky, Nikita, et al.
Published: (2025)
SAE-RNA: A Sparse Autoencoder Model for Interpreting RNA Language Model Representations
by: Kim, Taehan, et al.
Published: (2025)
by: Kim, Taehan, et al.
Published: (2025)
Interpreting Video Representations with Spatio-Temporal Sparse Autoencoders
by: Dokme, Atahan, et al.
Published: (2026)
by: Dokme, Atahan, et al.
Published: (2026)
SPARC: Concept-Aligned Sparse Autoencoders for Cross-Model and Cross-Modal Interpretability
by: Nasiri-Sarvi, Ali, et al.
Published: (2025)
by: Nasiri-Sarvi, Ali, et al.
Published: (2025)
DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
by: Wang, Xu, et al.
Published: (2026)
by: Wang, Xu, et al.
Published: (2026)
A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
Complete Chess Games Enable LLM Become A Chess Master
by: Zhang, Yinqi, et al.
Published: (2025)
by: Zhang, Yinqi, et al.
Published: (2025)
Measuring Sparse Autoencoder Feature Sensitivity
by: Tian, Claire, et al.
Published: (2025)
by: Tian, Claire, et al.
Published: (2025)
Generating Creative Chess Puzzles
by: Feng, Xidong, et al.
Published: (2025)
by: Feng, Xidong, et al.
Published: (2025)
Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning
by: Wang, Sai, et al.
Published: (2025)
by: Wang, Sai, et al.
Published: (2025)
How does Chain of Thought Think? Mechanistic Interpretability of Chain-of-Thought Reasoning with Sparse Autoencoding
by: Chen, Xi, et al.
Published: (2025)
by: Chen, Xi, et al.
Published: (2025)
ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models
by: Liu, Jincheng, et al.
Published: (2025)
by: Liu, Jincheng, et al.
Published: (2025)
Constrain Alignment with Sparse Autoencoders
by: Yin, Qingyu, et al.
Published: (2024)
by: Yin, Qingyu, et al.
Published: (2024)
Are Sparse Autoencoder Benchmarks Reliable?
by: Chanin, David
Published: (2026)
by: Chanin, David
Published: (2026)
TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation
by: Huang, Victor Shea-Jay, et al.
Published: (2025)
by: Huang, Victor Shea-Jay, et al.
Published: (2025)
Similar Items
-
TDHook: A Lightweight Framework for Interpretability
by: Poupart, Yoann
Published: (2025) -
Perspectives for Direct Interpretability in Multi-Agent Deep Reinforcement Learning
by: Poupart, Yoann, et al.
Published: (2025) -
Iterative Inference in a Chess-Playing Neural Network
by: Sandmann, Elias, et al.
Published: (2025) -
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
by: Jenner, Erik, et al.
Published: (2024) -
Interpreting CLIP with Hierarchical Sparse Autoencoders
by: Zaigrajew, Vladimir, et al.
Published: (2025)