Saved in:
| Main Authors: | Davies, Adam, Khakzar, Ashkan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.05859 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Dual-Perspective Approach to Evaluating Feature Attribution Methods
by: Li, Yawei, et al.
Published: (2023)
by: Li, Yawei, et al.
Published: (2023)
Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders
by: Lan, Michael, et al.
Published: (2024)
by: Lan, Michael, et al.
Published: (2024)
Interpretable Link Prediction in AI-Driven Cancer Research: Uncovering Co-Authorship Patterns
by: Mosallaie, Shahab, et al.
Published: (2025)
by: Mosallaie, Shahab, et al.
Published: (2025)
How to Interpret Agent Behavior
by: Gao, Jie, et al.
Published: (2026)
by: Gao, Jie, et al.
Published: (2026)
Explaining the Unexplained: Revealing Hidden Correlations for Better Interpretability
by: Jiang, Wen-Dong, et al.
Published: (2024)
by: Jiang, Wen-Dong, et al.
Published: (2024)
Latent Guard: a Safety Framework for Text-to-image Generation
by: Liu, Runtao, et al.
Published: (2024)
by: Liu, Runtao, et al.
Published: (2024)
Interpretable Representations in Explainable AI: From Theory to Practice
by: Sokol, Kacper, et al.
Published: (2020)
by: Sokol, Kacper, et al.
Published: (2020)
OntoPret: An Ontology for the Interpretation of Human Behavior
by: Ellis, Alexis, et al.
Published: (2025)
by: Ellis, Alexis, et al.
Published: (2025)
A Logic of Uncertain Interpretation
by: Bjorndahl, Adam
Published: (2025)
by: Bjorndahl, Adam
Published: (2025)
From Basis to Basis: Gaussian Particle Representation for Interpretable PDE Operators
by: Li, Zhihao, et al.
Published: (2026)
by: Li, Zhihao, et al.
Published: (2026)
Challenges in Mechanistically Interpreting Model Representations
by: Golechha, Satvik, et al.
Published: (2024)
by: Golechha, Satvik, et al.
Published: (2024)
Representation and Interpretation in Artificial and Natural Computing
by: Pineda, Luis A.
Published: (2025)
by: Pineda, Luis A.
Published: (2025)
Pando: Do Interpretability Methods Work When Models Won't Explain Themselves?
by: Zhong, Ziqian, et al.
Published: (2026)
by: Zhong, Ziqian, et al.
Published: (2026)
Beyond Behavior: Why AI Evaluation Needs a Cognitive Revolution
by: Konigsberg, Amir
Published: (2026)
by: Konigsberg, Amir
Published: (2026)
Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning
by: Sani, Numair, et al.
Published: (2020)
by: Sani, Numair, et al.
Published: (2020)
Cognitive BASIC: An In-Model Interpreted Reasoning Language for LLMs
by: Kramer, Oliver
Published: (2025)
by: Kramer, Oliver
Published: (2025)
Representations as Language: An Information-Theoretic Framework for Interpretability
by: Conklin, Henry, et al.
Published: (2024)
by: Conklin, Henry, et al.
Published: (2024)
Interpretable Representation Learning for Additive Rule Ensembles
by: Behzadimanesh, Shahrzad, et al.
Published: (2025)
by: Behzadimanesh, Shahrzad, et al.
Published: (2025)
Interpretable Neural Networks with Random Constructive Algorithm
by: Nan, Jing, et al.
Published: (2023)
by: Nan, Jing, et al.
Published: (2023)
MINAR: Mechanistic Interpretability for Neural Algorithmic Reasoning
by: He, Jesse, et al.
Published: (2026)
by: He, Jesse, et al.
Published: (2026)
Shared Lexical Task Representations Explain Behavioral Variability In LLMs
by: Yang, Zhuonan, et al.
Published: (2026)
by: Yang, Zhuonan, et al.
Published: (2026)
Pragmatic Policy Development via Interpretable Behavior Cloning
by: Matsson, Anton, et al.
Published: (2025)
by: Matsson, Anton, et al.
Published: (2025)
From artificial to organic: Rethinking the roots of intelligence for digital health
by: Ghimire, Prajwal, et al.
Published: (2025)
by: Ghimire, Prajwal, et al.
Published: (2025)
Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?
by: Bhambri, Siddhant, et al.
Published: (2025)
by: Bhambri, Siddhant, et al.
Published: (2025)
A Biologically Interpretable Cognitive Architecture for Online Structuring of Episodic Memories into Cognitive Maps
by: Dzhivelikian, E. A., et al.
Published: (2025)
by: Dzhivelikian, E. A., et al.
Published: (2025)
Can Interpretation Predict Behavior on Unseen Data?
by: Li, Victoria R., et al.
Published: (2025)
by: Li, Victoria R., et al.
Published: (2025)
Intuitionistic Fuzzy Cognitive Maps for Interpretable Image Classification
by: Sovatzidi, Georgia, et al.
Published: (2024)
by: Sovatzidi, Georgia, et al.
Published: (2024)
Learning Interpretable Rules for Scalable Data Representation and Classification
by: Wang, Zhuo, et al.
Published: (2023)
by: Wang, Zhuo, et al.
Published: (2023)
Evaluating Simplification Algorithms for Interpretability of Time Series Classification
by: Håvardstun, Brigt, et al.
Published: (2025)
by: Håvardstun, Brigt, et al.
Published: (2025)
Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features
by: Dixit, Satvik, et al.
Published: (2024)
by: Dixit, Satvik, et al.
Published: (2024)
Explaining Why Things Go Where They Go: Interpretable Constructs of Human Organizational Preferences
by: Fashae, Emmanuel, et al.
Published: (2025)
by: Fashae, Emmanuel, et al.
Published: (2025)
stl2vec: Semantic and Interpretable Vector Representation of Temporal Logic
by: Saveri, Gaia, et al.
Published: (2024)
by: Saveri, Gaia, et al.
Published: (2024)
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
by: Geiger, Atticus, et al.
Published: (2023)
by: Geiger, Atticus, et al.
Published: (2023)
Interpretable Pre-Trained Transformers for Heart Time-Series Data
by: Davies, Harry J., et al.
Published: (2024)
by: Davies, Harry J., et al.
Published: (2024)
Discovering Interpretable Algorithms by Decompiling Transformers to RASP
by: Huang, Xinting, et al.
Published: (2026)
by: Huang, Xinting, et al.
Published: (2026)
From Explainability to Interpretability: Interpretable Policies in Reinforcement Learning Via Model Explanation
by: Li, Peilang, et al.
Published: (2025)
by: Li, Peilang, et al.
Published: (2025)
From Gaze to Guidance: Interpreting and Adapting to Users' Cognitive Needs with Multimodal Gaze-Aware AI Assistants
by: Danry, Valdemar, et al.
Published: (2026)
by: Danry, Valdemar, et al.
Published: (2026)
Feature Engineering for Agents: An Adaptive Cognitive Architecture for Interpretable ML Monitoring
by: Bravo-Rocca, Gusseppe, et al.
Published: (2025)
by: Bravo-Rocca, Gusseppe, et al.
Published: (2025)
Behavior and Representation in Large Language Models for Combinatorial Optimization: From Feature Extraction to Algorithm Selection
by: Da Ros, Francesca, et al.
Published: (2025)
by: Da Ros, Francesca, et al.
Published: (2025)
Learning Interpretable Low-dimensional Representation via Physical Symmetry
by: Liu, Xuanjie, et al.
Published: (2023)
by: Liu, Xuanjie, et al.
Published: (2023)
Similar Items
-
A Dual-Perspective Approach to Evaluating Feature Attribution Methods
by: Li, Yawei, et al.
Published: (2023) -
Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders
by: Lan, Michael, et al.
Published: (2024) -
Interpretable Link Prediction in AI-Driven Cancer Research: Uncovering Co-Authorship Patterns
by: Mosallaie, Shahab, et al.
Published: (2025) -
How to Interpret Agent Behavior
by: Gao, Jie, et al.
Published: (2026) -
Explaining the Unexplained: Revealing Hidden Correlations for Better Interpretability
by: Jiang, Wen-Dong, et al.
Published: (2024)