:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Elhadri, Khawla, Michalski, Tomasz, Wróbel, Adam, Schlötterer, Jörg, Zieliński, Bartosz, Seifert, Christin
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2502.09340
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards Interpretable Deep Neural Networks for Tabular Data
by: Elhadri, Khawla, et al.
Published: (2025)

XNNTab -- Interpretable Neural Networks for Tabular Data using Sparse Autoencoders
by: Elhadri, Khawla, et al.
Published: (2025)

Persuasion Tokens for Editing Factual Knowledge in LLMs
by: Youssef, Paul, et al.
Published: (2026)

Invariant Learning with Annotation-free Environments
by: Le, Phuong Quynh, et al.
Published: (2025)

Out of Spuriousity: Improving Robustness to Spurious Correlations without Group Annotations
by: Le, Phuong Quynh, et al.
Published: (2024)

A Second Look on BASS -- Boosting Abstractive Summarization with Unified Semantic Graphs -- A Replication Study
by: Koraş, Osman Alperen, et al.
Published: (2024)

An XAI-based Analysis of Shortcut Learning in Neural Networks
by: Le, Phuong Quynh, et al.
Published: (2025)

Is Last Layer Re-Training Truly Sufficient for Robustness to Spurious Correlations?
by: Le, Phuong Quynh, et al.
Published: (2023)

Shortcut Mitigation via Spurious-Positive Samples
by: Le, Phuong Quynh, et al.
Published: (2026)

Personalized Interpretability -- Interactive Alignment of Prototypical Parts Networks
by: Michalski, Tomasz, et al.
Published: (2025)

Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers
by: Kuhn, Lukas, et al.
Published: (2025)

OMENN: One Matrix to Explain Neural Networks
by: Wróbel, Adam, et al.
Published: (2024)

Prototype-based Interpretable Breast Cancer Prediction Models: Analysis and Challenges
by: Pathak, Shreyasi, et al.
Published: (2024)

ProtoQuant: Quantization of Prototypical Parts For General and Fine-Grained Image Classification
by: Janusz, Mikołaj, et al.
Published: (2026)

The Queen of England is not England's Queen: On the Lack of Factual Coherency in PLMs
by: Youssef, Paul, et al.
Published: (2024)

Enhancing Fact Retrieval in PLMs through Truthfulness
by: Youssef, Paul, et al.
Published: (2024)

Fragment-Wise Interpretability in Graph Neural Networks via Molecule Decomposition and Contribution Analysis
by: Musiał, Sebastian, et al.
Published: (2025)

LucidPPN: Unambiguous Prototypical Parts Network for User-centric Interpretable Computer Vision
by: Pach, Mateusz, et al.
Published: (2024)

Guiding LLMs to Generate High-Fidelity and High-Quality Counterfactual Explanations for Text Classification
by: Nguyen, Van Bach, et al.
Published: (2025)

From Black Boxes to Conversations: Incorporating XAI in a Conversational Agent
by: Nguyen, Van Bach, et al.
Published: (2022)

CEval: A Benchmark for Evaluating Counterfactual Text Generation
by: Nguyen, Van Bach, et al.
Published: (2024)

Has this Fact been Edited? Detecting Knowledge Edits in Language Models
by: Youssef, Paul, et al.
Published: (2024)

Patch-based Intuitive Multimodal Prototypes Network (PIMPNet) for Alzheimer's Disease classification
by: De Santi, Lisa Anita, et al.
Published: (2024)

Enhancing Chemical Explainability Through Counterfactual Masking
by: Janisiów, Łukasz, et al.
Published: (2025)

DAVE: Distribution-aware Attribution via ViT Gradient Decomposition
by: Wróbel, Adam, et al.
Published: (2026)

Behavioral Analysis of Information Salience in Large Language Models
by: Trienes, Jan, et al.
Published: (2025)

Tracing and Reversing Edits in LLMs
by: Youssef, Paul, et al.
Published: (2025)

How to Make LLMs Forget: On Reversing In-Context Knowledge Edits
by: Youssef, Paul, et al.
Published: (2024)

Comparative Explanations: Explanation Guided Decision Making for Human-in-the-Loop Preference Selection
by: Chakraborty, Tanmay, et al.
Published: (2025)

PhAME: Phenotype-Aware Molecular Editing via Latent Diffusion
by: Janisiów, Łukasz, et al.
Published: (2026)

Explainable Bayesian Optimization
by: Chakraborty, Tanmay, et al.
Published: (2024)

Position: Editing Large Language Models Poses Serious Safety Risks
by: Youssef, Paul, et al.
Published: (2025)

LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study
by: Nguyen, Van Bach, et al.
Published: (2024)

Investigating the Impact of Randomness on Reproducibility in Computer Vision: A Study on Applications in Civil Engineering and Medicine
by: Eryılmaz, Bahadır, et al.
Published: (2024)

Efficient LLM Moderation with Multi-Layer Latent Prototypes
by: Chrabąszcz, Maciej, et al.
Published: (2025)

ProPML: Probability Partial Multi-label Learning
by: Struski, Łukasz, et al.
Published: (2024)

SONG: Self-Organizing Neural Graphs
by: Struski, Łukasz, et al.
Published: (2021)

One Mask to Rule Them All: On Hidden Facts after Editing and How to Find Them
by: Holmov, Ali, et al.
Published: (2026)

Efficient Multi-Source Knowledge Transfer by Model Merging
by: Osial, Marcin, et al.
Published: (2025)

Can Fine-Tuning Erase Your Edits? On the Fragile Coexistence of Knowledge Editing and Adaptation
by: Cheng, Yinjie, et al.
Published: (2025)