:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Davies, Adam, Khakzar, Ashkan
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2408.05859
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Dual-Perspective Approach to Evaluating Feature Attribution Methods
by: Li, Yawei, et al.
Published: (2023)

Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders
by: Lan, Michael, et al.
Published: (2024)

Interpretable Link Prediction in AI-Driven Cancer Research: Uncovering Co-Authorship Patterns
by: Mosallaie, Shahab, et al.
Published: (2025)

How to Interpret Agent Behavior
by: Gao, Jie, et al.
Published: (2026)

Explaining the Unexplained: Revealing Hidden Correlations for Better Interpretability
by: Jiang, Wen-Dong, et al.
Published: (2024)

Latent Guard: a Safety Framework for Text-to-image Generation
by: Liu, Runtao, et al.
Published: (2024)

Interpretable Representations in Explainable AI: From Theory to Practice
by: Sokol, Kacper, et al.
Published: (2020)

OntoPret: An Ontology for the Interpretation of Human Behavior
by: Ellis, Alexis, et al.
Published: (2025)

A Logic of Uncertain Interpretation
by: Bjorndahl, Adam
Published: (2025)

From Basis to Basis: Gaussian Particle Representation for Interpretable PDE Operators
by: Li, Zhihao, et al.
Published: (2026)

Challenges in Mechanistically Interpreting Model Representations
by: Golechha, Satvik, et al.
Published: (2024)

Representation and Interpretation in Artificial and Natural Computing
by: Pineda, Luis A.
Published: (2025)

Pando: Do Interpretability Methods Work When Models Won't Explain Themselves?
by: Zhong, Ziqian, et al.
Published: (2026)

Beyond Behavior: Why AI Evaluation Needs a Cognitive Revolution
by: Konigsberg, Amir
Published: (2026)

Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning
by: Sani, Numair, et al.
Published: (2020)

Cognitive BASIC: An In-Model Interpreted Reasoning Language for LLMs
by: Kramer, Oliver
Published: (2025)

Representations as Language: An Information-Theoretic Framework for Interpretability
by: Conklin, Henry, et al.
Published: (2024)

Interpretable Representation Learning for Additive Rule Ensembles
by: Behzadimanesh, Shahrzad, et al.
Published: (2025)

Interpretable Neural Networks with Random Constructive Algorithm
by: Nan, Jing, et al.
Published: (2023)

MINAR: Mechanistic Interpretability for Neural Algorithmic Reasoning
by: He, Jesse, et al.
Published: (2026)

Shared Lexical Task Representations Explain Behavioral Variability In LLMs
by: Yang, Zhuonan, et al.
Published: (2026)

Pragmatic Policy Development via Interpretable Behavior Cloning
by: Matsson, Anton, et al.
Published: (2025)

From artificial to organic: Rethinking the roots of intelligence for digital health
by: Ghimire, Prajwal, et al.
Published: (2025)

Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?
by: Bhambri, Siddhant, et al.
Published: (2025)

A Biologically Interpretable Cognitive Architecture for Online Structuring of Episodic Memories into Cognitive Maps
by: Dzhivelikian, E. A., et al.
Published: (2025)

Can Interpretation Predict Behavior on Unseen Data?
by: Li, Victoria R., et al.
Published: (2025)

Intuitionistic Fuzzy Cognitive Maps for Interpretable Image Classification
by: Sovatzidi, Georgia, et al.
Published: (2024)

Learning Interpretable Rules for Scalable Data Representation and Classification
by: Wang, Zhuo, et al.
Published: (2023)

Evaluating Simplification Algorithms for Interpretability of Time Series Classification
by: Håvardstun, Brigt, et al.
Published: (2025)

Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features
by: Dixit, Satvik, et al.
Published: (2024)

Explaining Why Things Go Where They Go: Interpretable Constructs of Human Organizational Preferences
by: Fashae, Emmanuel, et al.
Published: (2025)

stl2vec: Semantic and Interpretable Vector Representation of Temporal Logic
by: Saveri, Gaia, et al.
Published: (2024)

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
by: Geiger, Atticus, et al.
Published: (2023)

Interpretable Pre-Trained Transformers for Heart Time-Series Data
by: Davies, Harry J., et al.
Published: (2024)

Discovering Interpretable Algorithms by Decompiling Transformers to RASP
by: Huang, Xinting, et al.
Published: (2026)

From Explainability to Interpretability: Interpretable Policies in Reinforcement Learning Via Model Explanation
by: Li, Peilang, et al.
Published: (2025)

From Gaze to Guidance: Interpreting and Adapting to Users' Cognitive Needs with Multimodal Gaze-Aware AI Assistants
by: Danry, Valdemar, et al.
Published: (2026)

Feature Engineering for Agents: An Adaptive Cognitive Architecture for Interpretable ML Monitoring
by: Bravo-Rocca, Gusseppe, et al.
Published: (2025)

Behavior and Representation in Large Language Models for Combinatorial Optimization: From Feature Extraction to Algorithm Selection
by: Da Ros, Francesca, et al.
Published: (2025)

Learning Interpretable Low-dimensional Representation via Physical Symmetry
by: Liu, Xuanjie, et al.
Published: (2023)