Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liartis, Jason, Kaldeli, Eirini, Gyftokosta, Lambrini, Chelioudakis, Eleftherios, Mastromichalakis, Orfeas Menis
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2604.14970
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910135445618688
author	Liartis, Jason Kaldeli, Eirini Gyftokosta, Lambrini Chelioudakis, Eleftherios Mastromichalakis, Orfeas Menis
author_facet	Liartis, Jason Kaldeli, Eirini Gyftokosta, Lambrini Chelioudakis, Eleftherios Mastromichalakis, Orfeas Menis
contents	Hate, derogatory, and offensive speech remains a persistent challenge in online platforms and public discourse. While automated detection systems are widely used, most focus on censorship or removal, raising concerns for transparency and freedom of expression, and limiting opportunities to explain why content is harmful. To address these issues, explanatory approaches have emerged as a promising solution, aiming to make hate speech detection more transparent, accountable, and informative. In this paper, we present a hybrid approach that combines Large Language Models (LLMs) with three newly created and curated vocabularies to detect and explain hate speech in English, French, and Greek. Our system captures both inherently derogatory expressions tied to identity characteristics and direct group-targeted content through two complementary pipelines: one that detects and disambiguates problematic terms using the curated vocabularies, and one that leverages LLMs as context-aware evaluators of group-targeting content. The outputs are fused into grounded explanations that clarify why content is flagged. Human evaluation shows that our hybrid approach is accurate, with high-quality explanations, outperforming LLM-only baselines.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_14970
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Explain the Flag: Contextualizing Hate Speech Beyond Censorship Liartis, Jason Kaldeli, Eirini Gyftokosta, Lambrini Chelioudakis, Eleftherios Mastromichalakis, Orfeas Menis Computation and Language Hate, derogatory, and offensive speech remains a persistent challenge in online platforms and public discourse. While automated detection systems are widely used, most focus on censorship or removal, raising concerns for transparency and freedom of expression, and limiting opportunities to explain why content is harmful. To address these issues, explanatory approaches have emerged as a promising solution, aiming to make hate speech detection more transparent, accountable, and informative. In this paper, we present a hybrid approach that combines Large Language Models (LLMs) with three newly created and curated vocabularies to detect and explain hate speech in English, French, and Greek. Our system captures both inherently derogatory expressions tied to identity characteristics and direct group-targeted content through two complementary pipelines: one that detects and disambiguates problematic terms using the curated vocabularies, and one that leverages LLMs as context-aware evaluators of group-targeting content. The outputs are fused into grounded explanations that clarify why content is flagged. Human evaluation shows that our hybrid approach is accurate, with high-quality explanations, outperforming LLM-only baselines.
title	Explain the Flag: Contextualizing Hate Speech Beyond Censorship
topic	Computation and Language
url	https://arxiv.org/abs/2604.14970

Similar Items