Saved in:
Bibliographic Details
Main Authors: Kaneko, Masahiro, Bollegala, Danushka, Baldwin, Timothy
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2401.08511
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916093453402112
author Kaneko, Masahiro
Bollegala, Danushka
Baldwin, Timothy
author_facet Kaneko, Masahiro
Bollegala, Danushka
Baldwin, Timothy
contents The output tendencies of Pre-trained Language Models (PLM) vary markedly before and after Fine-Tuning (FT) due to the updates to the model parameters. These divergences in output tendencies result in a gap in the social biases of PLMs. For example, there exits a low correlation between intrinsic bias scores of a PLM and its extrinsic bias scores under FT-based debiasing methods. Additionally, applying FT-based debiasing methods to a PLM leads to a decline in performance in downstream tasks. On the other hand, PLMs trained on large datasets can learn without parameter updates via In-Context Learning (ICL) using prompts. ICL induces smaller changes to PLMs compared to FT-based debiasing methods. Therefore, we hypothesize that the gap observed in pre-trained and FT models does not hold true for debiasing methods that use ICL. In this study, we demonstrate that ICL-based debiasing methods show a higher correlation between intrinsic and extrinsic bias scores compared to FT-based methods. Moreover, the performance degradation due to debiasing is also lower in the ICL case compared to that in the FT case.
format Preprint
id arxiv_https___arxiv_org_abs_2401_08511
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle The Gaps between Pre-train and Downstream Settings in Bias Evaluation and Debiasing
Kaneko, Masahiro
Bollegala, Danushka
Baldwin, Timothy
Computation and Language
The output tendencies of Pre-trained Language Models (PLM) vary markedly before and after Fine-Tuning (FT) due to the updates to the model parameters. These divergences in output tendencies result in a gap in the social biases of PLMs. For example, there exits a low correlation between intrinsic bias scores of a PLM and its extrinsic bias scores under FT-based debiasing methods. Additionally, applying FT-based debiasing methods to a PLM leads to a decline in performance in downstream tasks. On the other hand, PLMs trained on large datasets can learn without parameter updates via In-Context Learning (ICL) using prompts. ICL induces smaller changes to PLMs compared to FT-based debiasing methods. Therefore, we hypothesize that the gap observed in pre-trained and FT models does not hold true for debiasing methods that use ICL. In this study, we demonstrate that ICL-based debiasing methods show a higher correlation between intrinsic and extrinsic bias scores compared to FT-based methods. Moreover, the performance degradation due to debiasing is also lower in the ICL case compared to that in the FT case.
title The Gaps between Pre-train and Downstream Settings in Bias Evaluation and Debiasing
topic Computation and Language
url https://arxiv.org/abs/2401.08511