Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.16458 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866909120819363840 |
|---|---|
| author | Yi, Peiling Zubiaga, Arkaitz |
| author_facet | Yi, Peiling Zubiaga, Arkaitz |
| contents | Swear words are a common proxy to collect datasets with cyberbullying incidents. Our focus is on measuring and mitigating biases derived from spurious associations between swear words and incidents occurring as a result of such data collection strategies. After demonstrating and quantifying these biases, we introduce ID-XCB, the first data-independent debiasing technique that combines adversarial training, bias constraints and debias fine-tuning approach aimed at alleviating model attention to bias-inducing words without impacting overall model performance. We explore ID-XCB on two popular session-based cyberbullying datasets along with comprehensive ablation and generalisation studies. We show that ID-XCB learns robust cyberbullying detection capabilities while mitigating biases, outperforming state-of-the-art debiasing methods in both performance and bias mitigation. Our quantitative and qualitative analyses demonstrate its generalisability to unseen data. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2402_16458 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | ID-XCB: Data-independent Debiasing for Fair and Accurate Transformer-based Cyberbullying Detection Yi, Peiling Zubiaga, Arkaitz Computation and Language Swear words are a common proxy to collect datasets with cyberbullying incidents. Our focus is on measuring and mitigating biases derived from spurious associations between swear words and incidents occurring as a result of such data collection strategies. After demonstrating and quantifying these biases, we introduce ID-XCB, the first data-independent debiasing technique that combines adversarial training, bias constraints and debias fine-tuning approach aimed at alleviating model attention to bias-inducing words without impacting overall model performance. We explore ID-XCB on two popular session-based cyberbullying datasets along with comprehensive ablation and generalisation studies. We show that ID-XCB learns robust cyberbullying detection capabilities while mitigating biases, outperforming state-of-the-art debiasing methods in both performance and bias mitigation. Our quantitative and qualitative analyses demonstrate its generalisability to unseen data. |
| title | ID-XCB: Data-independent Debiasing for Fair and Accurate Transformer-based Cyberbullying Detection |
| topic | Computation and Language |
| url | https://arxiv.org/abs/2402.16458 |