Enregistré dans:
Détails bibliographiques
Auteurs principaux: Kuzmin, Gleb, Yadav, Neemesh, Smirnov, Ivan, Baldwin, Timothy, Shelmanov, Artem
Format: Preprint
Publié: 2024
Sujets:
Accès en ligne:https://arxiv.org/abs/2407.19345
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866917950548606976
author Kuzmin, Gleb
Yadav, Neemesh
Smirnov, Ivan
Baldwin, Timothy
Shelmanov, Artem
author_facet Kuzmin, Gleb
Yadav, Neemesh
Smirnov, Ivan
Baldwin, Timothy
Shelmanov, Artem
contents We propose selective debiasing -- an inference-time safety mechanism designed to enhance the overall model quality in terms of prediction performance and fairness, especially in scenarios where retraining the model is impractical. The method draws inspiration from selective classification, where at inference time, predictions with low quality, as indicated by their uncertainty scores, are discarded. In our approach, we identify the potentially biased model predictions and, instead of discarding them, we remove bias from these predictions using LEACE -- a post-processing debiasing method. To select problematic predictions, we propose a bias quantification approach based on KL divergence, which achieves better results than standard uncertainty quantification methods. Experiments on text classification datasets with encoder-based classification models demonstrate that selective debiasing helps to reduce the performance gap between post-processing methods and debiasing techniques from the at-training and pre-processing categories.
format Preprint
id arxiv_https___arxiv_org_abs_2407_19345
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Inference-Time Selective Debiasing to Enhance Fairness in Text Classification Models
Kuzmin, Gleb
Yadav, Neemesh
Smirnov, Ivan
Baldwin, Timothy
Shelmanov, Artem
Computation and Language
Artificial Intelligence
We propose selective debiasing -- an inference-time safety mechanism designed to enhance the overall model quality in terms of prediction performance and fairness, especially in scenarios where retraining the model is impractical. The method draws inspiration from selective classification, where at inference time, predictions with low quality, as indicated by their uncertainty scores, are discarded. In our approach, we identify the potentially biased model predictions and, instead of discarding them, we remove bias from these predictions using LEACE -- a post-processing debiasing method. To select problematic predictions, we propose a bias quantification approach based on KL divergence, which achieves better results than standard uncertainty quantification methods. Experiments on text classification datasets with encoder-based classification models demonstrate that selective debiasing helps to reduce the performance gap between post-processing methods and debiasing techniques from the at-training and pre-processing categories.
title Inference-Time Selective Debiasing to Enhance Fairness in Text Classification Models
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2407.19345