Saved in:
Bibliographic Details
Main Authors: Cheng, Furui, Zouhar, Vilém, Chan, Robin Shing Moon, Fürst, Daniel, Strobelt, Hendrik, El-Assady, Mennatallah
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2405.00708
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911095541727232
author Cheng, Furui
Zouhar, Vilém
Chan, Robin Shing Moon
Fürst, Daniel
Strobelt, Hendrik
El-Assady, Mennatallah
author_facet Cheng, Furui
Zouhar, Vilém
Chan, Robin Shing Moon
Fürst, Daniel
Strobelt, Hendrik
El-Assady, Mennatallah
contents Understanding the behavior of large language models (LLMs) is crucial for ensuring their safe and reliable use. However, existing explainable AI (XAI) methods for LLMs primarily rely on word-level explanations, which are often computationally inefficient and misaligned with human reasoning processes. Moreover, these methods often treat explanation as a one-time output, overlooking its inherently interactive and iterative nature. In this paper, we present LLM Analyzer, an interactive visualization system that addresses these limitations by enabling intuitive and efficient exploration of LLM behaviors through counterfactual analysis. Our system features a novel algorithm that generates fluent and semantically meaningful counterfactuals via targeted removal and replacement operations at user-defined levels of granularity. These counterfactuals are used to compute feature attribution scores, which are then integrated with concrete examples in a table-based visualization, supporting dynamic analysis of model behavior. A user study with LLM practitioners and interviews with experts demonstrate the system's usability and effectiveness, emphasizing the importance of involving humans in the explanation process as active participants rather than passive recipients.
format Preprint
id arxiv_https___arxiv_org_abs_2405_00708
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Understanding Large Language Model Behaviors through Interactive Counterfactual Generation and Analysis
Cheng, Furui
Zouhar, Vilém
Chan, Robin Shing Moon
Fürst, Daniel
Strobelt, Hendrik
El-Assady, Mennatallah
Computation and Language
Artificial Intelligence
Human-Computer Interaction
Machine Learning
I.2.7; H.5.2
Understanding the behavior of large language models (LLMs) is crucial for ensuring their safe and reliable use. However, existing explainable AI (XAI) methods for LLMs primarily rely on word-level explanations, which are often computationally inefficient and misaligned with human reasoning processes. Moreover, these methods often treat explanation as a one-time output, overlooking its inherently interactive and iterative nature. In this paper, we present LLM Analyzer, an interactive visualization system that addresses these limitations by enabling intuitive and efficient exploration of LLM behaviors through counterfactual analysis. Our system features a novel algorithm that generates fluent and semantically meaningful counterfactuals via targeted removal and replacement operations at user-defined levels of granularity. These counterfactuals are used to compute feature attribution scores, which are then integrated with concrete examples in a table-based visualization, supporting dynamic analysis of model behavior. A user study with LLM practitioners and interviews with experts demonstrate the system's usability and effectiveness, emphasizing the importance of involving humans in the explanation process as active participants rather than passive recipients.
title Understanding Large Language Model Behaviors through Interactive Counterfactual Generation and Analysis
topic Computation and Language
Artificial Intelligence
Human-Computer Interaction
Machine Learning
I.2.7; H.5.2
url https://arxiv.org/abs/2405.00708