:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Poupart, Yoann
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2406.04028
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TDHook: A Lightweight Framework for Interpretability
by: Poupart, Yoann
Published: (2025)

Perspectives for Direct Interpretability in Multi-Agent Deep Reinforcement Learning
by: Poupart, Yoann, et al.
Published: (2025)

Iterative Inference in a Chess-Playing Neural Network
by: Sandmann, Elias, et al.
Published: (2025)

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
by: Jenner, Erik, et al.
Published: (2024)

Interpreting CLIP with Hierarchical Sparse Autoencoders
by: Zaigrajew, Vladimir, et al.
Published: (2025)

Mechanistic Interpretability with Sparse Autoencoder Neural Operators
by: Tolooshams, Bahareh, et al.
Published: (2025)

Causal Interpretation of Sparse Autoencoder Features in Vision
by: Han, Sangyu, et al.
Published: (2025)

Neural Network-based Information Set Weighting for Playing Reconnaissance Blind Chess
by: Bertram, Timo, et al.
Published: (2024)

Mixture of Masters: Sparse Chess Language Models with Player Routing
by: Frisoni, Giacomo, et al.
Published: (2026)

Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens
by: Jeong, Jihwan, et al.
Published: (2025)

UniMaia: Steering Chess Policies with Language for Human-like Play
by: Siu, Sherman, et al.
Published: (2026)

Towards Interpretable Protein Structure Prediction with Sparse Autoencoders
by: Parsan, Nithin, et al.
Published: (2025)

Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit
by: Jiang, Nick, et al.
Published: (2025)

SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
by: Cywiński, Bartosz, et al.
Published: (2025)

Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
by: Bhalla, Usha, et al.
Published: (2025)

Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control
by: Klenitskiy, Anton, et al.
Published: (2025)

Residualized Temporal Sparse Autoencoders for Interpreting Diffusion Models
by: Yeung, Calvin, et al.
Published: (2026)

ChessQA: Evaluating Large Language Models for Chess Understanding
by: Wen, Qianfeng, et al.
Published: (2025)

Amortized Planning with Large-Scale Transformers: A Case Study on Chess
by: Ruoss, Anian, et al.
Published: (2024)

Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs
by: Wang, Hao, et al.
Published: (2026)

LouvreSAE: Sparse Autoencoders for Interpretable and Controllable Style Transfer
by: Panda, Raina, et al.
Published: (2025)

Towards Interpretable and Inference-Optimal COT Reasoning with Sparse Autoencoder-Guided Generation
by: Zhao, Daniel, et al.
Published: (2025)

Why Online Reinforcement Learning is Causal
by: Schulte, Oliver, et al.
Published: (2024)

Sparse Autoencoders, Again?
by: Lu, Yin, et al.
Published: (2025)

IRDS: Interpretable RLVR Data Selection via Verifier-Coupled Sparse Autoencoder Coverage
by: Li, Yuhan, et al.
Published: (2026)

Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy
by: Balagansky, Nikita, et al.
Published: (2025)

SAE-RNA: A Sparse Autoencoder Model for Interpreting RNA Language Model Representations
by: Kim, Taehan, et al.
Published: (2025)

Interpreting Video Representations with Spatio-Temporal Sparse Autoencoders
by: Dokme, Atahan, et al.
Published: (2026)

SPARC: Concept-Aligned Sparse Autoencoders for Cross-Model and Cross-Modal Interpretability
by: Nasiri-Sarvi, Ali, et al.
Published: (2025)

DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
by: Wang, Xu, et al.
Published: (2026)

A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models
by: Shu, Dong, et al.
Published: (2025)

Complete Chess Games Enable LLM Become A Chess Master
by: Zhang, Yinqi, et al.
Published: (2025)

Measuring Sparse Autoencoder Feature Sensitivity
by: Tian, Claire, et al.
Published: (2025)

Generating Creative Chess Puzzles
by: Feng, Xidong, et al.
Published: (2025)

Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning
by: Wang, Sai, et al.
Published: (2025)

How does Chain of Thought Think? Mechanistic Interpretability of Chain-of-Thought Reasoning with Sparse Autoencoding
by: Chen, Xi, et al.
Published: (2025)

ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models
by: Liu, Jincheng, et al.
Published: (2025)

Constrain Alignment with Sparse Autoencoders
by: Yin, Qingyu, et al.
Published: (2024)

Are Sparse Autoencoder Benchmarks Reliable?
by: Chanin, David
Published: (2026)

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation
by: Huang, Victor Shea-Jay, et al.
Published: (2025)