Saved in:
Bibliographic Details
Main Authors: Baldini, Ioana, Yadav, Chhavi, Nagireddy, Manish, Das, Payel, Varshney, Kush R.
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2305.12620
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913516112314368
author Baldini, Ioana
Yadav, Chhavi
Nagireddy, Manish
Das, Payel
Varshney, Kush R.
author_facet Baldini, Ioana
Yadav, Chhavi
Nagireddy, Manish
Das, Payel
Varshney, Kush R.
contents Bias auditing of language models (LMs) has received considerable attention as LMs are becoming widespread. As such, several benchmarks for bias auditing have been proposed. At the same time, the rapid evolution of LMs can make these benchmarks irrelevant in no time. Bias auditing is further complicated by LM brittleness: when a presumably biased outcome is observed, is it due to model bias or model brittleness? We propose enlisting the models themselves to help construct bias auditing datasets that remain challenging, and introduce bias measures that distinguish between different types of model errors. First, we extend an existing bias benchmark for NLI (BBNLI) using a combination of LM-generated lexical variations, adversarial filtering, and human validation. We demonstrate that the newly created dataset BBNLI-next is more challenging than BBNLI: on average, BBNLI-next reduces the accuracy of state-of-the-art NLI models from 95.3%, as observed by BBNLI, to a strikingly low 57.5%. Second, we employ BBNLI-next to showcase the interplay between robustness and bias: we point out shortcomings in current bias scores and propose bias measures that take into account both bias and model brittleness. Third, despite the fact that BBNLI-next was designed with non-generative models in mind, we show that the new dataset is also able to uncover bias in state-of-the-art open-source generative LMs. Note: All datasets included in this work are in English and they address US-centered social biases. In the spirit of efficient NLP research, no model training or fine-tuning was performed to conduct this research. Warning: This paper contains offensive text examples.
format Preprint
id arxiv_https___arxiv_org_abs_2305_12620
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Keeping Up with the Language Models: Systematic Benchmark Extension for Bias Auditing
Baldini, Ioana
Yadav, Chhavi
Nagireddy, Manish
Das, Payel
Varshney, Kush R.
Computation and Language
Bias auditing of language models (LMs) has received considerable attention as LMs are becoming widespread. As such, several benchmarks for bias auditing have been proposed. At the same time, the rapid evolution of LMs can make these benchmarks irrelevant in no time. Bias auditing is further complicated by LM brittleness: when a presumably biased outcome is observed, is it due to model bias or model brittleness? We propose enlisting the models themselves to help construct bias auditing datasets that remain challenging, and introduce bias measures that distinguish between different types of model errors. First, we extend an existing bias benchmark for NLI (BBNLI) using a combination of LM-generated lexical variations, adversarial filtering, and human validation. We demonstrate that the newly created dataset BBNLI-next is more challenging than BBNLI: on average, BBNLI-next reduces the accuracy of state-of-the-art NLI models from 95.3%, as observed by BBNLI, to a strikingly low 57.5%. Second, we employ BBNLI-next to showcase the interplay between robustness and bias: we point out shortcomings in current bias scores and propose bias measures that take into account both bias and model brittleness. Third, despite the fact that BBNLI-next was designed with non-generative models in mind, we show that the new dataset is also able to uncover bias in state-of-the-art open-source generative LMs. Note: All datasets included in this work are in English and they address US-centered social biases. In the spirit of efficient NLP research, no model training or fine-tuning was performed to conduct this research. Warning: This paper contains offensive text examples.
title Keeping Up with the Language Models: Systematic Benchmark Extension for Bias Auditing
topic Computation and Language
url https://arxiv.org/abs/2305.12620