Saved in:
| Main Authors: | Zhang, Yang, Li, Yawei, Brown, Hannah, Rezaei, Mina, Bischl, Bernd, Torr, Philip, Khakzar, Ashkan, Kawaguchi, Kenji |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2310.06514 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Dual-Perspective Approach to Evaluating Feature Attribution Methods
by: Li, Yawei, et al.
Published: (2023)
by: Li, Yawei, et al.
Published: (2023)
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
by: Zhang, Yang, et al.
Published: (2024)
by: Zhang, Yang, et al.
Published: (2024)
Calibrating LLMs with Information-Theoretic Evidential Deep Learning
by: Li, Yawei, et al.
Published: (2025)
by: Li, Yawei, et al.
Published: (2025)
On Discprecncies between Perturbation Evaluations of Graph Neural Network Attributions
by: Rezaei, Razieh, et al.
Published: (2024)
by: Rezaei, Razieh, et al.
Published: (2024)
RelP: Faithful and Efficient Circuit Discovery in Language Models via Relevance Patching
by: Jafari, Farnoush Rezaei, et al.
Published: (2025)
by: Jafari, Farnoush Rezaei, et al.
Published: (2025)
Minimalist Concept Erasure in Generative Models
by: Zhang, Yang, et al.
Published: (2025)
by: Zhang, Yang, et al.
Published: (2025)
How Visual Representations Map to Language Feature Space in Multimodal LLMs
by: Venhoff, Constantin, et al.
Published: (2025)
by: Venhoff, Constantin, et al.
Published: (2025)
Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders
by: Lan, Michael, et al.
Published: (2024)
by: Lan, Michael, et al.
Published: (2024)
Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval
by: Venhoff, Constantin, et al.
Published: (2025)
by: Venhoff, Constantin, et al.
Published: (2025)
Structured Credal Learning
by: Venkatesh, Varun, et al.
Published: (2026)
by: Venkatesh, Varun, et al.
Published: (2026)
What Is Fairness? On the Role of Protected Attributes and Fictitious Worlds
by: Bothmann, Ludwig, et al.
Published: (2022)
by: Bothmann, Ludwig, et al.
Published: (2022)
Learnable Sparsity for Vision Generative Models
by: Zhang, Yang, et al.
Published: (2024)
by: Zhang, Yang, et al.
Published: (2024)
Towards Understanding Multimodal Fine-Tuning: Spatial Features
by: Naghashyar, Lachin, et al.
Published: (2026)
by: Naghashyar, Lachin, et al.
Published: (2026)
Uncertainty-aware Pseudo-label Selection for Positive-Unlabeled Learning
by: Dorigatti, Emilio, et al.
Published: (2022)
by: Dorigatti, Emilio, et al.
Published: (2022)
Latent Guard: a Safety Framework for Text-to-image Generation
by: Liu, Runtao, et al.
Published: (2024)
by: Liu, Runtao, et al.
Published: (2024)
Learning Visual Prompts for Guiding the Attention of Vision Transformers
by: Rezaei, Razieh, et al.
Published: (2024)
by: Rezaei, Razieh, et al.
Published: (2024)
Mixture of Experts Made Intrinsically Interpretable
by: Yang, Xingyi, et al.
Published: (2025)
by: Yang, Xingyi, et al.
Published: (2025)
The Attribution Impossibility: No Feature Ranking Is Faithful, Stable, and Complete Under Collinearity
by: Caraker, Drake, et al.
Published: (2026)
by: Caraker, Drake, et al.
Published: (2026)
Articulate3D: Zero-Shot Text-Driven 3D Object Posing
by: Deb, Oishi, et al.
Published: (2025)
by: Deb, Oishi, et al.
Published: (2025)
Efficient Document Embeddings via Self-Contrastive Bregman Divergence Learning
by: Saggau, Daniel, et al.
Published: (2023)
by: Saggau, Daniel, et al.
Published: (2023)
Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attribution Explainability
by: Edin, Joakim, et al.
Published: (2024)
by: Edin, Joakim, et al.
Published: (2024)
AlignGuard: Scalable Safety Alignment for Text-to-Image Generation
by: Liu, Runtao, et al.
Published: (2024)
by: Liu, Runtao, et al.
Published: (2024)
Ideal Attribution and Faithful Watermarks for Language Models
by: Song, Min Jae, et al.
Published: (2025)
by: Song, Min Jae, et al.
Published: (2025)
Faithfulness Evaluation for Decoder-only LLM Attributions with Controlled Retained Information
by: Huang, Xin, et al.
Published: (2026)
by: Huang, Xin, et al.
Published: (2026)
On the Robustness of Global Feature Effect Explanations
by: Baniecki, Hubert, et al.
Published: (2024)
by: Baniecki, Hubert, et al.
Published: (2024)
Incorporating Attribution Importance for Improving Faithfulness Metrics
by: Zhao, Zhixue, et al.
Published: (2023)
by: Zhao, Zhixue, et al.
Published: (2023)
Beyond Output Faithfulness: Learning Attributions that Preserve Computational Pathways
by: Zhang, Siyu, et al.
Published: (2025)
by: Zhang, Siyu, et al.
Published: (2025)
Single Character Perturbations Break LLM Alignment
by: Lin, Leon, et al.
Published: (2024)
by: Lin, Leon, et al.
Published: (2024)
The Cognitive Revolution in Interpretability: From Explaining Behavior to Interpreting Representations and Algorithms
by: Davies, Adam, et al.
Published: (2024)
by: Davies, Adam, et al.
Published: (2024)
The Attribution Contract: Feature Attribution for Generative Language Models
by: Nguyen, Giang
Published: (2026)
by: Nguyen, Giang
Published: (2026)
Self-Evaluation as a Defense Against Adversarial Attacks on LLMs
by: Brown, Hannah, et al.
Published: (2024)
by: Brown, Hannah, et al.
Published: (2024)
Analyzing Error Sources in Global Feature Effect Estimation
by: Heiß, Timo, et al.
Published: (2026)
by: Heiß, Timo, et al.
Published: (2026)
Post-hoc Orthogonalization for Mitigation of Protected Feature Bias in CXR Embeddings
by: Weber, Tobias, et al.
Published: (2023)
by: Weber, Tobias, et al.
Published: (2023)
Decomposing Global Feature Effects Based on Feature Interactions
by: Herbinger, Julia, et al.
Published: (2023)
by: Herbinger, Julia, et al.
Published: (2023)
ABE: A Unified Framework for Robust and Faithful Attribution-Based Explainability
by: Zhu, Zhiyu, et al.
Published: (2025)
by: Zhu, Zhiyu, et al.
Published: (2025)
Wrapper Boxes: Faithful Attribution of Model Predictions to Training Data
by: Su, Yiheng, et al.
Published: (2023)
by: Su, Yiheng, et al.
Published: (2023)
Impossibility Theorems for Feature Attribution
by: Bilodeau, Blair, et al.
Published: (2022)
by: Bilodeau, Blair, et al.
Published: (2022)
How Faithful Is Trajectory-Based Data Attribution? Error Sources, Remedies, and Practical Guidelines
by: Deng, Junwei, et al.
Published: (2026)
by: Deng, Junwei, et al.
Published: (2026)
Feature Attribution from First Principles
by: Taimeskhanov, Magamed, et al.
Published: (2025)
by: Taimeskhanov, Magamed, et al.
Published: (2025)
Disentangling Interactions and Dependencies in Feature Attribution
by: König, Gunnar, et al.
Published: (2024)
by: König, Gunnar, et al.
Published: (2024)
Similar Items
-
A Dual-Perspective Approach to Evaluating Feature Attribution Methods
by: Li, Yawei, et al.
Published: (2023) -
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
by: Zhang, Yang, et al.
Published: (2024) -
Calibrating LLMs with Information-Theoretic Evidential Deep Learning
by: Li, Yawei, et al.
Published: (2025) -
On Discprecncies between Perturbation Evaluations of Graph Neural Network Attributions
by: Rezaei, Razieh, et al.
Published: (2024) -
RelP: Faithful and Efficient Circuit Discovery in Language Models via Relevance Patching
by: Jafari, Farnoush Rezaei, et al.
Published: (2025)