Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Cheng, Furui, Zouhar, Vilém, Chan, Robin Shing Moon, Fürst, Daniel, Strobelt, Hendrik, El-Assady, Mennatallah
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence Human-Computer Interaction Machine Learning I.2.7; H.5.2
Online Access:	https://arxiv.org/abs/2405.00708
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911095541727232
author	Cheng, Furui Zouhar, Vilém Chan, Robin Shing Moon Fürst, Daniel Strobelt, Hendrik El-Assady, Mennatallah
author_facet	Cheng, Furui Zouhar, Vilém Chan, Robin Shing Moon Fürst, Daniel Strobelt, Hendrik El-Assady, Mennatallah
contents	Understanding the behavior of large language models (LLMs) is crucial for ensuring their safe and reliable use. However, existing explainable AI (XAI) methods for LLMs primarily rely on word-level explanations, which are often computationally inefficient and misaligned with human reasoning processes. Moreover, these methods often treat explanation as a one-time output, overlooking its inherently interactive and iterative nature. In this paper, we present LLM Analyzer, an interactive visualization system that addresses these limitations by enabling intuitive and efficient exploration of LLM behaviors through counterfactual analysis. Our system features a novel algorithm that generates fluent and semantically meaningful counterfactuals via targeted removal and replacement operations at user-defined levels of granularity. These counterfactuals are used to compute feature attribution scores, which are then integrated with concrete examples in a table-based visualization, supporting dynamic analysis of model behavior. A user study with LLM practitioners and interviews with experts demonstrate the system's usability and effectiveness, emphasizing the importance of involving humans in the explanation process as active participants rather than passive recipients.
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_00708
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Understanding Large Language Model Behaviors through Interactive Counterfactual Generation and Analysis Cheng, Furui Zouhar, Vilém Chan, Robin Shing Moon Fürst, Daniel Strobelt, Hendrik El-Assady, Mennatallah Computation and Language Artificial Intelligence Human-Computer Interaction Machine Learning I.2.7; H.5.2 Understanding the behavior of large language models (LLMs) is crucial for ensuring their safe and reliable use. However, existing explainable AI (XAI) methods for LLMs primarily rely on word-level explanations, which are often computationally inefficient and misaligned with human reasoning processes. Moreover, these methods often treat explanation as a one-time output, overlooking its inherently interactive and iterative nature. In this paper, we present LLM Analyzer, an interactive visualization system that addresses these limitations by enabling intuitive and efficient exploration of LLM behaviors through counterfactual analysis. Our system features a novel algorithm that generates fluent and semantically meaningful counterfactuals via targeted removal and replacement operations at user-defined levels of granularity. These counterfactuals are used to compute feature attribution scores, which are then integrated with concrete examples in a table-based visualization, supporting dynamic analysis of model behavior. A user study with LLM practitioners and interviews with experts demonstrate the system's usability and effectiveness, emphasizing the importance of involving humans in the explanation process as active participants rather than passive recipients.
title	Understanding Large Language Model Behaviors through Interactive Counterfactual Generation and Analysis
topic	Computation and Language Artificial Intelligence Human-Computer Interaction Machine Learning I.2.7; H.5.2
url	https://arxiv.org/abs/2405.00708

Similar Items