Saved in:
| Main Authors: | Chen, Jennifer L., Ladhak, Faisal, Li, Daniel, Elhadad, Noémie |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.06213 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval
by: Adams, Griffin, et al.
Published: (2024)
by: Adams, Griffin, et al.
Published: (2024)
HateXScore: A Metric Suite for Evaluating Reasoning Quality in Hate Speech Explanations
by: Hu, Yujia, et al.
Published: (2026)
by: Hu, Yujia, et al.
Published: (2026)
An Effective, Robust and Fairness-aware Hate Speech Detection Framework
by: Mou, Guanyi, et al.
Published: (2024)
by: Mou, Guanyi, et al.
Published: (2024)
HateDebias: On the Diversity and Variability of Hate Speech Debiasing
by: Wu, Hongyan, et al.
Published: (2024)
by: Wu, Hongyan, et al.
Published: (2024)
Dual-Class Prompt Generation: Enhancing Indonesian Gender-Based Hate Speech Detection through Data Augmentation
by: Ibrahim, Muhammad Amien, et al.
Published: (2025)
by: Ibrahim, Muhammad Amien, et al.
Published: (2025)
"Is Hate Lost in Translation?": Evaluation of Multilingual LGBTQIA+ Hate Speech Detection
by: Chan, Fai Leui, et al.
Published: (2024)
by: Chan, Fai Leui, et al.
Published: (2024)
Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
by: Wang, Yifan, et al.
Published: (2025)
by: Wang, Yifan, et al.
Published: (2025)
Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech
by: Yadav, Neemesh, et al.
Published: (2024)
by: Yadav, Neemesh, et al.
Published: (2024)
Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster
by: Calabrese, Agostina, et al.
Published: (2024)
by: Calabrese, Agostina, et al.
Published: (2024)
PEACE 2.0: Grounded Explanations and Counter-Speech for Combating Hate Expressions
by: Damo, Greta, et al.
Published: (2026)
by: Damo, Greta, et al.
Published: (2026)
SAFE-MEME: Structured Reasoning Framework for Robust Hate Speech Detection in Memes
by: Nandi, Palash, et al.
Published: (2024)
by: Nandi, Palash, et al.
Published: (2024)
Aligning Large Language Models via Fine-grained Supervision
by: Xu, Dehong, et al.
Published: (2024)
by: Xu, Dehong, et al.
Published: (2024)
Hateful Person or Hateful Model? Investigating the Role of Personas in Hate Speech Detection by Large Language Models
by: Yuan, Shuzhou, et al.
Published: (2025)
by: Yuan, Shuzhou, et al.
Published: (2025)
MasonPerplexity at Multimodal Hate Speech Event Detection 2024: Hate Speech and Target Detection Using Transformer Ensembles
by: Ganguly, Amrita, et al.
Published: (2024)
by: Ganguly, Amrita, et al.
Published: (2024)
Aligning Attention with Human Rationales for Self-Explaining Hate Speech Detection
by: Eilertsen, Brage, et al.
Published: (2025)
by: Eilertsen, Brage, et al.
Published: (2025)
Compositional Generalisation for Explainable Hate Speech Detection
by: Calabrese, Agostina, et al.
Published: (2025)
by: Calabrese, Agostina, et al.
Published: (2025)
Automatic Textual Normalization for Hate Speech Detection
by: Nguyen, Anh Thi-Hoang, et al.
Published: (2023)
by: Nguyen, Anh Thi-Hoang, et al.
Published: (2023)
Advancing Hate Speech Detection with Transformers: Insights from the MetaHate
by: Chapagain, Santosh, et al.
Published: (2025)
by: Chapagain, Santosh, et al.
Published: (2025)
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
by: Sun, Simeng, et al.
Published: (2025)
by: Sun, Simeng, et al.
Published: (2025)
Algorithmic Fairness in NLP: Persona-Infused LLMs for Human-Centric Hate Speech Detection
by: Gajewska, Ewelina, et al.
Published: (2025)
by: Gajewska, Ewelina, et al.
Published: (2025)
NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data
by: Tonneau, Manuel, et al.
Published: (2024)
by: Tonneau, Manuel, et al.
Published: (2024)
When Hate Meets Facts: LLMs-in-the-Loop for Check-worthiness Detection in Hate Speech
by: Ocampo, Nicolás Benjamín, et al.
Published: (2026)
by: Ocampo, Nicolás Benjamín, et al.
Published: (2026)
MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection
by: Piot, Paloma, et al.
Published: (2024)
by: Piot, Paloma, et al.
Published: (2024)
Hate Speech Detection with Generalizable Target-aware Fairness
by: Chen, Tong, et al.
Published: (2024)
by: Chen, Tong, et al.
Published: (2024)
HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection
by: Proskurina, Irina, et al.
Published: (2025)
by: Proskurina, Irina, et al.
Published: (2025)
Hate Speech According to the Law: An Analysis for Effective Detection
by: Korre, Katerina, et al.
Published: (2024)
by: Korre, Katerina, et al.
Published: (2024)
Self-Explaining Hate Speech Detection with Moral Rationales
by: Vargas, Francielle, et al.
Published: (2026)
by: Vargas, Francielle, et al.
Published: (2026)
Code-Mixed Telugu-English Hate Speech Detection
by: Kakarla, Santhosh, et al.
Published: (2025)
by: Kakarla, Santhosh, et al.
Published: (2025)
GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection?
by: Jin, Yiping, et al.
Published: (2024)
by: Jin, Yiping, et al.
Published: (2024)
Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models
by: Bui, Minh Duc, et al.
Published: (2024)
by: Bui, Minh Duc, et al.
Published: (2024)
Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection
by: Muscato, Benedetta, et al.
Published: (2026)
by: Muscato, Benedetta, et al.
Published: (2026)
HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models
by: Nghiem, Huy, et al.
Published: (2024)
by: Nghiem, Huy, et al.
Published: (2024)
HateTinyLLM : Hate Speech Detection Using Tiny Large Language Models
by: Sen, Tanmay, et al.
Published: (2024)
by: Sen, Tanmay, et al.
Published: (2024)
EkoHate: Abusive Language and Hate Speech Detection for Code-switched Political Discussions on Nigerian Twitter
by: Ilevbare, Comfort Eseohen, et al.
Published: (2024)
by: Ilevbare, Comfort Eseohen, et al.
Published: (2024)
Decoding Hate: Exploring Language Models' Reactions to Hate Speech
by: Piot, Paloma, et al.
Published: (2024)
by: Piot, Paloma, et al.
Published: (2024)
X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework
by: Rehman, Mohammad Zia Ur, et al.
Published: (2026)
by: Rehman, Mohammad Zia Ur, et al.
Published: (2026)
Towards Fairness Assessment of Dutch Hate Speech Detection
by: Bauer, Julie, et al.
Published: (2025)
by: Bauer, Julie, et al.
Published: (2025)
Probing Critical Learning Dynamics of PLMs for Hate Speech Detection
by: Masud, Sarah, et al.
Published: (2024)
by: Masud, Sarah, et al.
Published: (2024)
xList-Hate: A Checklist-Based Framework for Interpretable and Generalizable Hate Speech Detection
by: Girón, Adrián, et al.
Published: (2026)
by: Girón, Adrián, et al.
Published: (2026)
Cracking the Code: Enhancing Implicit Hate Speech Detection through Coding Classification
by: Wei, Lu, et al.
Published: (2025)
by: Wei, Lu, et al.
Published: (2025)
Similar Items
-
SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval
by: Adams, Griffin, et al.
Published: (2024) -
HateXScore: A Metric Suite for Evaluating Reasoning Quality in Hate Speech Explanations
by: Hu, Yujia, et al.
Published: (2026) -
An Effective, Robust and Fairness-aware Hate Speech Detection Framework
by: Mou, Guanyi, et al.
Published: (2024) -
HateDebias: On the Diversity and Variability of Hate Speech Debiasing
by: Wu, Hongyan, et al.
Published: (2024) -
Dual-Class Prompt Generation: Enhancing Indonesian Gender-Based Hate Speech Detection through Data Augmentation
by: Ibrahim, Muhammad Amien, et al.
Published: (2025)