:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lopardo, Gianluigi, Precioso, Frederic, Garreau, Damien
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2402.03485
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Understanding Post-hoc Explainers: The Case of Anchors
by: Lopardo, Gianluigi, et al.
Published: (2023)

Faithful and Robust Local Interpretability for Textual Predictions
by: Lopardo, Gianluigi, et al.
Published: (2023)

A Sea of Words: An In-Depth Analysis of Anchors for Text Data
by: Lopardo, Gianluigi, et al.
Published: (2022)

Comparing Feature Importance and Rule Extraction for Interpretability on Text Data
by: Lopardo, Gianluigi, et al.
Published: (2022)

MMD-Flagger: Leveraging Maximum Mean Discrepancy to Detect Hallucinations
by: Mitsuzawa, Kensuke, et al.
Published: (2025)

Beyond Mixtures and Products for Ensemble Aggregation: A Likelihood Perspective on Generalized Means
by: Razafindralambo, Raphaël, et al.
Published: (2026)

Towards Understanding Steering Strength
by: Taimeskhanov, Magamed, et al.
Published: (2026)

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
by: Zhang, Qingru, et al.
Published: (2023)

When Are Two Scores Better Than One? Investigating Ensembles of Diffusion Models
by: Razafindralambo, Raphaël, et al.
Published: (2026)

Harnessing Large Language Models as Post-hoc Correctors
by: Zhong, Zhiqiang, et al.
Published: (2024)

WolBanking77: Wolof Banking Speech Intent Classification Dataset
by: Kandji, Abdou Karim, et al.
Published: (2025)

How Attention Sinks Emerge in Large Language Models: An Interpretability Perspective
by: Peng, Runyu, et al.
Published: (2026)

A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers
by: Meadows, Jordan, et al.
Published: (2023)

The Effect of Model Size on LLM Post-hoc Explainability via LIME
by: Heyen, Henning, et al.
Published: (2024)

Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods
by: Dhaini, Mahdi, et al.
Published: (2025)

Rethinking Expert Trajectory Utilization in LLM Post-training for Mathematical Reasoning
by: Ding, Bowen, et al.
Published: (2025)

LLMs Explain't: A Post-Mortem on Semantic Interpretability in Transformer Models
by: Abdelhalim, Alhassan, et al.
Published: (2026)

CAM-Based Methods Can See through Walls
by: Taimeskhanov, Magamed, et al.
Published: (2024)

The Risks of Recourse in Binary Classification
by: Fokkema, Hidde, et al.
Published: (2023)

Feature Attribution from First Principles
by: Taimeskhanov, Magamed, et al.
Published: (2025)

On The Variability of Concept Activation Vectors
by: Wenkmann, Julia, et al.
Published: (2025)

Attention Needs to Focus: A Unified Perspective on Attention Allocation
by: Fu, Zichuan, et al.
Published: (2026)

When and Why Does Unsupervised RL Succeed in Mathematical Reasoning? A Manifold Envelopment Perspective
by: Zhang, Zelin, et al.
Published: (2026)

Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models
by: Zhao, Dachuan, et al.
Published: (2025)

SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection
by: Hu, Mengya, et al.
Published: (2024)

Adaptive Two Sided Laplace Transforms: A Learnable, Interpretable, and Scalable Replacement for Self-Attention
by: Kiruluta, Andrew
Published: (2025)

Post-Training Sparse Attention with Double Sparsity
by: Yang, Shuo, et al.
Published: (2024)

A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
by: Ayonrinde, Kola, et al.
Published: (2025)

Interpretable Generative Models through Post-hoc Concept Bottlenecks
by: Kulkarni, Akshay, et al.
Published: (2025)

Group Fairness Meets the Black Box: Enabling Fair Algorithms on Closed LLMs via Post-Processing
by: Xian, Ruicheng, et al.
Published: (2025)

Unmasking Hallucinations: A Causal Graph-Attention Perspective on Factual Reliability in Large Language Models
by: kurra, Sailesh kiran, et al.
Published: (2026)

Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
by: Sun, Hao, et al.
Published: (2025)

Improving Prediction Performance and Model Interpretability through Attention Mechanisms from Basic and Applied Research Perspectives
by: Kitada, Shunsuke
Published: (2023)

Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
by: Neo, Clement, et al.
Published: (2024)

Mind the map! Accounting for existing map information when estimating online HDMaps from sensor
by: Sun, Rémy, et al.
Published: (2023)

Does RoBERTa Perform Better than BERT in Continual Learning: An Attention Sink Perspective
by: Bai, Xueying, et al.
Published: (2024)

Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective
by: He, Shenghua, et al.
Published: (2025)

How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective
by: Liu, Qi, et al.
Published: (2025)

A High-Resolution Landscape Dataset for Concept-Based XAI With Application to Species Distribution Models
by: de la Brosse, Augustin, et al.
Published: (2026)

Detecting Suicidal Ideation in Text with Interpretable Deep Learning: A CNN-BiGRU with Attention Mechanism
by: Bhuiyan, Mohaiminul Islam, et al.
Published: (2025)