:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Biecek, Przemyslaw, Longo, Luca, Zhou, Jianlong, Fel, Thomas, Holzinger, Andreas, Samek, Wojciech
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2606.01189
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Model Science: getting serious about verification, explanation and control of AI systems
by: Biecek, Przemyslaw, et al.
Published: (2025)

Position: Explain to Question not to Justify
by: Biecek, Przemyslaw, et al.
Published: (2024)

CNN-based explanation ensembling for dataset, representation and explanations evaluation
by: Hryniewska-Guzik, Weronika, et al.
Published: (2024)

Exploring Local Explanations of Nonlinear Models Using Animated Linear Projections
by: Spyrison, Nicholas, et al.
Published: (2022)

SwordBench: Evaluating Orthogonality of Steering Image Representations
by: Zaigrajew, Vladimir, et al.
Published: (2026)

Ethical ChatGPT: Concerns, Challenges, and Commandments
by: Zhou, Jianlong, et al.
Published: (2023)

Position: Do Not Explain Vision Models Without Context
by: Tomaszewska, Paulina, et al.
Published: (2024)

Your CLIP has 164 dimensions of noise: Exploring the embeddings covariance eigenspectrum of contrastively pretrained vision-language transformers
by: Grzywaczewski, Jakub, et al.
Published: (2026)

Global Counterfactual Directions
by: Sobieski, Bartlomiej, et al.
Published: (2024)

Adversarial attacks and defenses in explainable artificial intelligence: A survey
by: Baniecki, Hubert, et al.
Published: (2023)

Attributions All the Way Down? The Metagame of Interpretability
by: Baniecki, Hubert, et al.
Published: (2026)

Sparks of Explainability: Recent Advancements in Explaining Large Vision Models
by: Fel, Thomas
Published: (2025)

XAI-guided Insulator Anomaly Detection for Imbalanced Datasets
by: Hoefler, Maximilian Andreas, et al.
Published: (2024)

X-ray transferable polyrepresentation learning
by: Hryniewska-Guzik, Weronika, et al.
Published: (2025)

Interpreting CLIP with Hierarchical Sparse Autoencoders
by: Zaigrajew, Vladimir, et al.
Published: (2025)

Iterative Inference in a Chess-Playing Neural Network
by: Sandmann, Elias, et al.
Published: (2025)

Ensuring Medical AI Safety: Interpretability-Driven Detection and Mitigation of Spurious Model Behavior and Associated Data
by: Pahde, Frederik, et al.
Published: (2025)

NormEnsembleXAI: Unveiling the Strengths and Weaknesses of XAI Ensemble Techniques
by: Hryniewska-Guzik, Weronika, et al.
Published: (2024)

Optimizing Federated Learning by Entropy-Based Client Selection
by: Lutz, Andreas, et al.
Published: (2024)

Atlas-Alignment: Making Interpretability Transferable Across Language Models
by: Puri, Bruno, et al.
Published: (2025)

Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations
by: Dreyer, Maximilian, et al.
Published: (2023)

LINE: LLM-based Iterative Neuron Explanations for Vision Models
by: Zaigrajew, Vladimir, et al.
Published: (2026)

Exploration of the Rashomon Set Assists Trustworthy Explanations for Medical Data
by: Kobylińska, Katarzyna, et al.
Published: (2023)

From Attribution to Action: A Human-Centered Application of Activation Steering
by: Labarta, Tobias, et al.
Published: (2026)

Steering CLIP's vision transformer with sparse autoencoders
by: Joseph, Sonia, et al.
Published: (2025)

Explaining Predictive Uncertainty by Exposing Second-Order Effects
by: Bley, Florian, et al.
Published: (2024)

Rethinking Visual Counterfactual Explanations Through Region Constraint
by: Sobieski, Bartlomiej, et al.
Published: (2024)

Sparse, Efficient and Explainable Data Attribution with DualXDA
by: Yolcu, Galip Ümit, et al.
Published: (2024)

The Dark Patterns of Personalized Persuasion in Large Language Models: Exposing Persuasive Linguistic Features for Big Five Personality Traits in LLMs Responses
by: Mieleszczenko-Kowszewicz, Wiktoria, et al.
Published: (2024)

From What to How: Attributing CLIP's Latent Components Reveals Unexpected Semantic Reliance
by: Dreyer, Maximilian, et al.
Published: (2025)

Structural Compactness as a Complementary Criterion for Explanation Quality
by: Mesgari, Mohammad Mahdi, et al.
Published: (2026)

Post-Hoc Concept Disentanglement: From Correlated to Isolated Concept Representations
by: Erogullari, Eren, et al.
Published: (2025)

Mind What You Ask For: Emotional and Rational Faces of Persuasion by Large Language Models
by: Mieleszczenko-Kowszewicz, Wiktoria, et al.
Published: (2025)

Synthetic Datasets for Machine Learning on Spatio-Temporal Graphs using PDEs
by: Arndt, Jost, et al.
Published: (2025)

ECQ$^{\text{x}}$: Explainability-Driven Quantization for Low-Bit and Sparse DNNs
by: Becking, Daniel, et al.
Published: (2021)

System-Embedded Diffusion Bridge Models
by: Sobieski, Bartlomiej, et al.
Published: (2025)

Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression
by: Bareeva, Dilyara, et al.
Published: (2024)

The System Hallucination Scale (SHS): A Minimal yet Effective Human-Centered Instrument for Evaluating Hallucination-Related Behavior in Large Language Models
by: Müller, Heimo, et al.
Published: (2026)

$α$-TCAV: A Unified Framework for Testing with Concept Activation Vectors
by: Schnoor, Ekkehard, et al.
Published: (2026)

Local Intrinsic Dimension Unveils Hallucinations in Diffusion Models
by: Sobieski, Bartlomiej, et al.
Published: (2026)