Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Minhui, Ijner, Prahar, Wald, Yoav, Creager, Elliot
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2511.20713
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915637903753216
author	Zhang, Minhui Ijner, Prahar Wald, Yoav Creager, Elliot
author_facet	Zhang, Minhui Ijner, Prahar Wald, Yoav Creager, Elliot
contents	Large Language Models (LLMs) often exhibit systematic errors on specific subsets of data, known as error slices. For instance, a slice can correspond to a certain demographic, where a model does poorly in identifying toxic comments regarding that demographic. Identifying error slices is crucial to understanding and improving models, but it is also challenging. An appealing approach to reduce the amount of manual annotation required is to actively group errors that are likely to belong to the same slice, while using limited access to an annotator to verify whether the chosen samples share the same pattern of model mistake. In this paper, we formalize this approach as Active Slice Discovery and explore it empirically on a problem of discovering human-defined slices in toxicity classification. We examine the efficacy of active slice discovery under different choices of feature representations and active learning algorithms. On several slices, we find that uncertainty-based active learning algorithms are most effective, achieving competitive accuracy using 2-10% of the available slice membership information, while significantly outperforming baselines.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_20713
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Active Slice Discovery in Large Language Models Zhang, Minhui Ijner, Prahar Wald, Yoav Creager, Elliot Machine Learning Artificial Intelligence Large Language Models (LLMs) often exhibit systematic errors on specific subsets of data, known as error slices. For instance, a slice can correspond to a certain demographic, where a model does poorly in identifying toxic comments regarding that demographic. Identifying error slices is crucial to understanding and improving models, but it is also challenging. An appealing approach to reduce the amount of manual annotation required is to actively group errors that are likely to belong to the same slice, while using limited access to an annotator to verify whether the chosen samples share the same pattern of model mistake. In this paper, we formalize this approach as Active Slice Discovery and explore it empirically on a problem of discovering human-defined slices in toxicity classification. We examine the efficacy of active slice discovery under different choices of feature representations and active learning algorithms. On several slices, we find that uncertainty-based active learning algorithms are most effective, achieving competitive accuracy using 2-10% of the available slice membership information, while significantly outperforming baselines.
title	Active Slice Discovery in Large Language Models
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2511.20713

Similar Items