Saved in:
Bibliographic Details
Main Authors: Yi, Peiling, Zubiaga, Arkaitz
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.16458
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909120819363840
author Yi, Peiling
Zubiaga, Arkaitz
author_facet Yi, Peiling
Zubiaga, Arkaitz
contents Swear words are a common proxy to collect datasets with cyberbullying incidents. Our focus is on measuring and mitigating biases derived from spurious associations between swear words and incidents occurring as a result of such data collection strategies. After demonstrating and quantifying these biases, we introduce ID-XCB, the first data-independent debiasing technique that combines adversarial training, bias constraints and debias fine-tuning approach aimed at alleviating model attention to bias-inducing words without impacting overall model performance. We explore ID-XCB on two popular session-based cyberbullying datasets along with comprehensive ablation and generalisation studies. We show that ID-XCB learns robust cyberbullying detection capabilities while mitigating biases, outperforming state-of-the-art debiasing methods in both performance and bias mitigation. Our quantitative and qualitative analyses demonstrate its generalisability to unseen data.
format Preprint
id arxiv_https___arxiv_org_abs_2402_16458
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle ID-XCB: Data-independent Debiasing for Fair and Accurate Transformer-based Cyberbullying Detection
Yi, Peiling
Zubiaga, Arkaitz
Computation and Language
Swear words are a common proxy to collect datasets with cyberbullying incidents. Our focus is on measuring and mitigating biases derived from spurious associations between swear words and incidents occurring as a result of such data collection strategies. After demonstrating and quantifying these biases, we introduce ID-XCB, the first data-independent debiasing technique that combines adversarial training, bias constraints and debias fine-tuning approach aimed at alleviating model attention to bias-inducing words without impacting overall model performance. We explore ID-XCB on two popular session-based cyberbullying datasets along with comprehensive ablation and generalisation studies. We show that ID-XCB learns robust cyberbullying detection capabilities while mitigating biases, outperforming state-of-the-art debiasing methods in both performance and bias mitigation. Our quantitative and qualitative analyses demonstrate its generalisability to unseen data.
title ID-XCB: Data-independent Debiasing for Fair and Accurate Transformer-based Cyberbullying Detection
topic Computation and Language
url https://arxiv.org/abs/2402.16458