Saved in:
| Main Authors: | Biecek, Przemyslaw, Longo, Luca, Zhou, Jianlong, Fel, Thomas, Holzinger, Andreas, Samek, Wojciech |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2606.01189 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Model Science: getting serious about verification, explanation and control of AI systems
by: Biecek, Przemyslaw, et al.
Published: (2025)
by: Biecek, Przemyslaw, et al.
Published: (2025)
Position: Explain to Question not to Justify
by: Biecek, Przemyslaw, et al.
Published: (2024)
by: Biecek, Przemyslaw, et al.
Published: (2024)
CNN-based explanation ensembling for dataset, representation and explanations evaluation
by: Hryniewska-Guzik, Weronika, et al.
Published: (2024)
by: Hryniewska-Guzik, Weronika, et al.
Published: (2024)
Exploring Local Explanations of Nonlinear Models Using Animated Linear Projections
by: Spyrison, Nicholas, et al.
Published: (2022)
by: Spyrison, Nicholas, et al.
Published: (2022)
SwordBench: Evaluating Orthogonality of Steering Image Representations
by: Zaigrajew, Vladimir, et al.
Published: (2026)
by: Zaigrajew, Vladimir, et al.
Published: (2026)
Ethical ChatGPT: Concerns, Challenges, and Commandments
by: Zhou, Jianlong, et al.
Published: (2023)
by: Zhou, Jianlong, et al.
Published: (2023)
Position: Do Not Explain Vision Models Without Context
by: Tomaszewska, Paulina, et al.
Published: (2024)
by: Tomaszewska, Paulina, et al.
Published: (2024)
Your CLIP has 164 dimensions of noise: Exploring the embeddings covariance eigenspectrum of contrastively pretrained vision-language transformers
by: Grzywaczewski, Jakub, et al.
Published: (2026)
by: Grzywaczewski, Jakub, et al.
Published: (2026)
Global Counterfactual Directions
by: Sobieski, Bartlomiej, et al.
Published: (2024)
by: Sobieski, Bartlomiej, et al.
Published: (2024)
Adversarial attacks and defenses in explainable artificial intelligence: A survey
by: Baniecki, Hubert, et al.
Published: (2023)
by: Baniecki, Hubert, et al.
Published: (2023)
Attributions All the Way Down? The Metagame of Interpretability
by: Baniecki, Hubert, et al.
Published: (2026)
by: Baniecki, Hubert, et al.
Published: (2026)
Sparks of Explainability: Recent Advancements in Explaining Large Vision Models
by: Fel, Thomas
Published: (2025)
by: Fel, Thomas
Published: (2025)
XAI-guided Insulator Anomaly Detection for Imbalanced Datasets
by: Hoefler, Maximilian Andreas, et al.
Published: (2024)
by: Hoefler, Maximilian Andreas, et al.
Published: (2024)
X-ray transferable polyrepresentation learning
by: Hryniewska-Guzik, Weronika, et al.
Published: (2025)
by: Hryniewska-Guzik, Weronika, et al.
Published: (2025)
Interpreting CLIP with Hierarchical Sparse Autoencoders
by: Zaigrajew, Vladimir, et al.
Published: (2025)
by: Zaigrajew, Vladimir, et al.
Published: (2025)
Iterative Inference in a Chess-Playing Neural Network
by: Sandmann, Elias, et al.
Published: (2025)
by: Sandmann, Elias, et al.
Published: (2025)
Ensuring Medical AI Safety: Interpretability-Driven Detection and Mitigation of Spurious Model Behavior and Associated Data
by: Pahde, Frederik, et al.
Published: (2025)
by: Pahde, Frederik, et al.
Published: (2025)
NormEnsembleXAI: Unveiling the Strengths and Weaknesses of XAI Ensemble Techniques
by: Hryniewska-Guzik, Weronika, et al.
Published: (2024)
by: Hryniewska-Guzik, Weronika, et al.
Published: (2024)
Optimizing Federated Learning by Entropy-Based Client Selection
by: Lutz, Andreas, et al.
Published: (2024)
by: Lutz, Andreas, et al.
Published: (2024)
Atlas-Alignment: Making Interpretability Transferable Across Language Models
by: Puri, Bruno, et al.
Published: (2025)
by: Puri, Bruno, et al.
Published: (2025)
Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations
by: Dreyer, Maximilian, et al.
Published: (2023)
by: Dreyer, Maximilian, et al.
Published: (2023)
LINE: LLM-based Iterative Neuron Explanations for Vision Models
by: Zaigrajew, Vladimir, et al.
Published: (2026)
by: Zaigrajew, Vladimir, et al.
Published: (2026)
Exploration of the Rashomon Set Assists Trustworthy Explanations for Medical Data
by: Kobylińska, Katarzyna, et al.
Published: (2023)
by: Kobylińska, Katarzyna, et al.
Published: (2023)
From Attribution to Action: A Human-Centered Application of Activation Steering
by: Labarta, Tobias, et al.
Published: (2026)
by: Labarta, Tobias, et al.
Published: (2026)
Steering CLIP's vision transformer with sparse autoencoders
by: Joseph, Sonia, et al.
Published: (2025)
by: Joseph, Sonia, et al.
Published: (2025)
Explaining Predictive Uncertainty by Exposing Second-Order Effects
by: Bley, Florian, et al.
Published: (2024)
by: Bley, Florian, et al.
Published: (2024)
Rethinking Visual Counterfactual Explanations Through Region Constraint
by: Sobieski, Bartlomiej, et al.
Published: (2024)
by: Sobieski, Bartlomiej, et al.
Published: (2024)
Sparse, Efficient and Explainable Data Attribution with DualXDA
by: Yolcu, Galip Ümit, et al.
Published: (2024)
by: Yolcu, Galip Ümit, et al.
Published: (2024)
The Dark Patterns of Personalized Persuasion in Large Language Models: Exposing Persuasive Linguistic Features for Big Five Personality Traits in LLMs Responses
by: Mieleszczenko-Kowszewicz, Wiktoria, et al.
Published: (2024)
by: Mieleszczenko-Kowszewicz, Wiktoria, et al.
Published: (2024)
From What to How: Attributing CLIP's Latent Components Reveals Unexpected Semantic Reliance
by: Dreyer, Maximilian, et al.
Published: (2025)
by: Dreyer, Maximilian, et al.
Published: (2025)
Structural Compactness as a Complementary Criterion for Explanation Quality
by: Mesgari, Mohammad Mahdi, et al.
Published: (2026)
by: Mesgari, Mohammad Mahdi, et al.
Published: (2026)
Post-Hoc Concept Disentanglement: From Correlated to Isolated Concept Representations
by: Erogullari, Eren, et al.
Published: (2025)
by: Erogullari, Eren, et al.
Published: (2025)
Mind What You Ask For: Emotional and Rational Faces of Persuasion by Large Language Models
by: Mieleszczenko-Kowszewicz, Wiktoria, et al.
Published: (2025)
by: Mieleszczenko-Kowszewicz, Wiktoria, et al.
Published: (2025)
Synthetic Datasets for Machine Learning on Spatio-Temporal Graphs using PDEs
by: Arndt, Jost, et al.
Published: (2025)
by: Arndt, Jost, et al.
Published: (2025)
ECQ$^{\text{x}}$: Explainability-Driven Quantization for Low-Bit and Sparse DNNs
by: Becking, Daniel, et al.
Published: (2021)
by: Becking, Daniel, et al.
Published: (2021)
System-Embedded Diffusion Bridge Models
by: Sobieski, Bartlomiej, et al.
Published: (2025)
by: Sobieski, Bartlomiej, et al.
Published: (2025)
Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression
by: Bareeva, Dilyara, et al.
Published: (2024)
by: Bareeva, Dilyara, et al.
Published: (2024)
The System Hallucination Scale (SHS): A Minimal yet Effective Human-Centered Instrument for Evaluating Hallucination-Related Behavior in Large Language Models
by: Müller, Heimo, et al.
Published: (2026)
by: Müller, Heimo, et al.
Published: (2026)
$α$-TCAV: A Unified Framework for Testing with Concept Activation Vectors
by: Schnoor, Ekkehard, et al.
Published: (2026)
by: Schnoor, Ekkehard, et al.
Published: (2026)
Local Intrinsic Dimension Unveils Hallucinations in Diffusion Models
by: Sobieski, Bartlomiej, et al.
Published: (2026)
by: Sobieski, Bartlomiej, et al.
Published: (2026)
Similar Items
-
Model Science: getting serious about verification, explanation and control of AI systems
by: Biecek, Przemyslaw, et al.
Published: (2025) -
Position: Explain to Question not to Justify
by: Biecek, Przemyslaw, et al.
Published: (2024) -
CNN-based explanation ensembling for dataset, representation and explanations evaluation
by: Hryniewska-Guzik, Weronika, et al.
Published: (2024) -
Exploring Local Explanations of Nonlinear Models Using Animated Linear Projections
by: Spyrison, Nicholas, et al.
Published: (2022) -
SwordBench: Evaluating Orthogonality of Steering Image Representations
by: Zaigrajew, Vladimir, et al.
Published: (2026)