Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kaneko, Masahiro, Bollegala, Danushka, Baldwin, Timothy
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2401.08511
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916093453402112
author	Kaneko, Masahiro Bollegala, Danushka Baldwin, Timothy
author_facet	Kaneko, Masahiro Bollegala, Danushka Baldwin, Timothy
contents	The output tendencies of Pre-trained Language Models (PLM) vary markedly before and after Fine-Tuning (FT) due to the updates to the model parameters. These divergences in output tendencies result in a gap in the social biases of PLMs. For example, there exits a low correlation between intrinsic bias scores of a PLM and its extrinsic bias scores under FT-based debiasing methods. Additionally, applying FT-based debiasing methods to a PLM leads to a decline in performance in downstream tasks. On the other hand, PLMs trained on large datasets can learn without parameter updates via In-Context Learning (ICL) using prompts. ICL induces smaller changes to PLMs compared to FT-based debiasing methods. Therefore, we hypothesize that the gap observed in pre-trained and FT models does not hold true for debiasing methods that use ICL. In this study, we demonstrate that ICL-based debiasing methods show a higher correlation between intrinsic and extrinsic bias scores compared to FT-based methods. Moreover, the performance degradation due to debiasing is also lower in the ICL case compared to that in the FT case.
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_08511
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	The Gaps between Pre-train and Downstream Settings in Bias Evaluation and Debiasing Kaneko, Masahiro Bollegala, Danushka Baldwin, Timothy Computation and Language The output tendencies of Pre-trained Language Models (PLM) vary markedly before and after Fine-Tuning (FT) due to the updates to the model parameters. These divergences in output tendencies result in a gap in the social biases of PLMs. For example, there exits a low correlation between intrinsic bias scores of a PLM and its extrinsic bias scores under FT-based debiasing methods. Additionally, applying FT-based debiasing methods to a PLM leads to a decline in performance in downstream tasks. On the other hand, PLMs trained on large datasets can learn without parameter updates via In-Context Learning (ICL) using prompts. ICL induces smaller changes to PLMs compared to FT-based debiasing methods. Therefore, we hypothesize that the gap observed in pre-trained and FT models does not hold true for debiasing methods that use ICL. In this study, we demonstrate that ICL-based debiasing methods show a higher correlation between intrinsic and extrinsic bias scores compared to FT-based methods. Moreover, the performance degradation due to debiasing is also lower in the ICL case compared to that in the FT case.
title	The Gaps between Pre-train and Downstream Settings in Bias Evaluation and Debiasing
topic	Computation and Language
url	https://arxiv.org/abs/2401.08511

Similar Items