Saved in:
| Main Authors: | Lopardo, Gianluigi, Precioso, Frederic, Garreau, Damien |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.03485 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Understanding Post-hoc Explainers: The Case of Anchors
by: Lopardo, Gianluigi, et al.
Published: (2023)
by: Lopardo, Gianluigi, et al.
Published: (2023)
Faithful and Robust Local Interpretability for Textual Predictions
by: Lopardo, Gianluigi, et al.
Published: (2023)
by: Lopardo, Gianluigi, et al.
Published: (2023)
A Sea of Words: An In-Depth Analysis of Anchors for Text Data
by: Lopardo, Gianluigi, et al.
Published: (2022)
by: Lopardo, Gianluigi, et al.
Published: (2022)
Comparing Feature Importance and Rule Extraction for Interpretability on Text Data
by: Lopardo, Gianluigi, et al.
Published: (2022)
by: Lopardo, Gianluigi, et al.
Published: (2022)
MMD-Flagger: Leveraging Maximum Mean Discrepancy to Detect Hallucinations
by: Mitsuzawa, Kensuke, et al.
Published: (2025)
by: Mitsuzawa, Kensuke, et al.
Published: (2025)
Beyond Mixtures and Products for Ensemble Aggregation: A Likelihood Perspective on Generalized Means
by: Razafindralambo, Raphaël, et al.
Published: (2026)
by: Razafindralambo, Raphaël, et al.
Published: (2026)
Towards Understanding Steering Strength
by: Taimeskhanov, Magamed, et al.
Published: (2026)
by: Taimeskhanov, Magamed, et al.
Published: (2026)
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
by: Zhang, Qingru, et al.
Published: (2023)
by: Zhang, Qingru, et al.
Published: (2023)
When Are Two Scores Better Than One? Investigating Ensembles of Diffusion Models
by: Razafindralambo, Raphaël, et al.
Published: (2026)
by: Razafindralambo, Raphaël, et al.
Published: (2026)
Harnessing Large Language Models as Post-hoc Correctors
by: Zhong, Zhiqiang, et al.
Published: (2024)
by: Zhong, Zhiqiang, et al.
Published: (2024)
WolBanking77: Wolof Banking Speech Intent Classification Dataset
by: Kandji, Abdou Karim, et al.
Published: (2025)
by: Kandji, Abdou Karim, et al.
Published: (2025)
How Attention Sinks Emerge in Large Language Models: An Interpretability Perspective
by: Peng, Runyu, et al.
Published: (2026)
by: Peng, Runyu, et al.
Published: (2026)
A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers
by: Meadows, Jordan, et al.
Published: (2023)
by: Meadows, Jordan, et al.
Published: (2023)
The Effect of Model Size on LLM Post-hoc Explainability via LIME
by: Heyen, Henning, et al.
Published: (2024)
by: Heyen, Henning, et al.
Published: (2024)
Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods
by: Dhaini, Mahdi, et al.
Published: (2025)
by: Dhaini, Mahdi, et al.
Published: (2025)
Rethinking Expert Trajectory Utilization in LLM Post-training for Mathematical Reasoning
by: Ding, Bowen, et al.
Published: (2025)
by: Ding, Bowen, et al.
Published: (2025)
LLMs Explain't: A Post-Mortem on Semantic Interpretability in Transformer Models
by: Abdelhalim, Alhassan, et al.
Published: (2026)
by: Abdelhalim, Alhassan, et al.
Published: (2026)
CAM-Based Methods Can See through Walls
by: Taimeskhanov, Magamed, et al.
Published: (2024)
by: Taimeskhanov, Magamed, et al.
Published: (2024)
The Risks of Recourse in Binary Classification
by: Fokkema, Hidde, et al.
Published: (2023)
by: Fokkema, Hidde, et al.
Published: (2023)
Feature Attribution from First Principles
by: Taimeskhanov, Magamed, et al.
Published: (2025)
by: Taimeskhanov, Magamed, et al.
Published: (2025)
On The Variability of Concept Activation Vectors
by: Wenkmann, Julia, et al.
Published: (2025)
by: Wenkmann, Julia, et al.
Published: (2025)
Attention Needs to Focus: A Unified Perspective on Attention Allocation
by: Fu, Zichuan, et al.
Published: (2026)
by: Fu, Zichuan, et al.
Published: (2026)
When and Why Does Unsupervised RL Succeed in Mathematical Reasoning? A Manifold Envelopment Perspective
by: Zhang, Zelin, et al.
Published: (2026)
by: Zhang, Zelin, et al.
Published: (2026)
Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models
by: Zhao, Dachuan, et al.
Published: (2025)
by: Zhao, Dachuan, et al.
Published: (2025)
SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection
by: Hu, Mengya, et al.
Published: (2024)
by: Hu, Mengya, et al.
Published: (2024)
Adaptive Two Sided Laplace Transforms: A Learnable, Interpretable, and Scalable Replacement for Self-Attention
by: Kiruluta, Andrew
Published: (2025)
by: Kiruluta, Andrew
Published: (2025)
Post-Training Sparse Attention with Double Sparsity
by: Yang, Shuo, et al.
Published: (2024)
by: Yang, Shuo, et al.
Published: (2024)
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
by: Ayonrinde, Kola, et al.
Published: (2025)
by: Ayonrinde, Kola, et al.
Published: (2025)
Interpretable Generative Models through Post-hoc Concept Bottlenecks
by: Kulkarni, Akshay, et al.
Published: (2025)
by: Kulkarni, Akshay, et al.
Published: (2025)
Group Fairness Meets the Black Box: Enabling Fair Algorithms on Closed LLMs via Post-Processing
by: Xian, Ruicheng, et al.
Published: (2025)
by: Xian, Ruicheng, et al.
Published: (2025)
Unmasking Hallucinations: A Causal Graph-Attention Perspective on Factual Reliability in Large Language Models
by: kurra, Sailesh kiran, et al.
Published: (2026)
by: kurra, Sailesh kiran, et al.
Published: (2026)
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
by: Sun, Hao, et al.
Published: (2025)
by: Sun, Hao, et al.
Published: (2025)
Improving Prediction Performance and Model Interpretability through Attention Mechanisms from Basic and Applied Research Perspectives
by: Kitada, Shunsuke
Published: (2023)
by: Kitada, Shunsuke
Published: (2023)
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
by: Neo, Clement, et al.
Published: (2024)
by: Neo, Clement, et al.
Published: (2024)
Mind the map! Accounting for existing map information when estimating online HDMaps from sensor
by: Sun, Rémy, et al.
Published: (2023)
by: Sun, Rémy, et al.
Published: (2023)
Does RoBERTa Perform Better than BERT in Continual Learning: An Attention Sink Perspective
by: Bai, Xueying, et al.
Published: (2024)
by: Bai, Xueying, et al.
Published: (2024)
Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective
by: He, Shenghua, et al.
Published: (2025)
by: He, Shenghua, et al.
Published: (2025)
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective
by: Liu, Qi, et al.
Published: (2025)
by: Liu, Qi, et al.
Published: (2025)
A High-Resolution Landscape Dataset for Concept-Based XAI With Application to Species Distribution Models
by: de la Brosse, Augustin, et al.
Published: (2026)
by: de la Brosse, Augustin, et al.
Published: (2026)
Detecting Suicidal Ideation in Text with Interpretable Deep Learning: A CNN-BiGRU with Attention Mechanism
by: Bhuiyan, Mohaiminul Islam, et al.
Published: (2025)
by: Bhuiyan, Mohaiminul Islam, et al.
Published: (2025)
Similar Items
-
Understanding Post-hoc Explainers: The Case of Anchors
by: Lopardo, Gianluigi, et al.
Published: (2023) -
Faithful and Robust Local Interpretability for Textual Predictions
by: Lopardo, Gianluigi, et al.
Published: (2023) -
A Sea of Words: An In-Depth Analysis of Anchors for Text Data
by: Lopardo, Gianluigi, et al.
Published: (2022) -
Comparing Feature Importance and Rule Extraction for Interpretability on Text Data
by: Lopardo, Gianluigi, et al.
Published: (2022) -
MMD-Flagger: Leveraging Maximum Mean Discrepancy to Detect Hallucinations
by: Mitsuzawa, Kensuke, et al.
Published: (2025)