MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Chou, Hsuan-Yu, Naveed, Wajiha, Zhou, Shuyan, Yang, Xiaowei
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Computation and Language Human-Computer Interaction Machine Learning Social and Information Networks
Accesso online:	https://arxiv.org/abs/2602.05189
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866908814174846976
author	Chou, Hsuan-Yu Naveed, Wajiha Zhou, Shuyan Yang, Xiaowei
author_facet	Chou, Hsuan-Yu Naveed, Wajiha Zhou, Shuyan Yang, Xiaowei
contents	As internet access expands, so does exposure to harmful content, increasing the need for effective moderation. Research has demonstrated that large language models (LLMs) can be effectively utilized for social media moderation tasks, including harmful content detection. While proprietary LLMs have been shown to zero-shot outperform traditional machine learning models, the out-of-the-box capability of open-weight LLMs remains an open question. Motivated by recent developments of reasoning LLMs, we evaluate seven state-of-the-art models: four proprietary and three open-weight. Testing with real-world posts on Bluesky, moderation decisions by Bluesky Moderation Service, and annotations by two authors, we find a considerable degree of overlap between the sensitivity (81%--97%) and specificity (91%--100%) of the open-weight LLMs and those (72%--98%, and 93%--99%) of the proprietary ones. Additionally, our analysis reveals that specificity exceeds sensitivity for rudeness detection, but the opposite holds for intolerance and threats. Lastly, we identify inter-rater agreement across human moderators and the LLMs, highlighting considerations for deploying LLMs in both platform-scale and personalized moderation contexts. These findings show open-weight LLMs can support privacy-preserving moderation on consumer-grade hardware and suggest new directions for designing moderation systems that balance community values with individual user preferences.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_05189
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Are Open-Weight LLMs Ready for Social Media Moderation? A Comparative Study on Bluesky Chou, Hsuan-Yu Naveed, Wajiha Zhou, Shuyan Yang, Xiaowei Computation and Language Human-Computer Interaction Machine Learning Social and Information Networks As internet access expands, so does exposure to harmful content, increasing the need for effective moderation. Research has demonstrated that large language models (LLMs) can be effectively utilized for social media moderation tasks, including harmful content detection. While proprietary LLMs have been shown to zero-shot outperform traditional machine learning models, the out-of-the-box capability of open-weight LLMs remains an open question. Motivated by recent developments of reasoning LLMs, we evaluate seven state-of-the-art models: four proprietary and three open-weight. Testing with real-world posts on Bluesky, moderation decisions by Bluesky Moderation Service, and annotations by two authors, we find a considerable degree of overlap between the sensitivity (81%--97%) and specificity (91%--100%) of the open-weight LLMs and those (72%--98%, and 93%--99%) of the proprietary ones. Additionally, our analysis reveals that specificity exceeds sensitivity for rudeness detection, but the opposite holds for intolerance and threats. Lastly, we identify inter-rater agreement across human moderators and the LLMs, highlighting considerations for deploying LLMs in both platform-scale and personalized moderation contexts. These findings show open-weight LLMs can support privacy-preserving moderation on consumer-grade hardware and suggest new directions for designing moderation systems that balance community values with individual user preferences.
title	Are Open-Weight LLMs Ready for Social Media Moderation? A Comparative Study on Bluesky
topic	Computation and Language Human-Computer Interaction Machine Learning Social and Information Networks
url	https://arxiv.org/abs/2602.05189

Documenti analoghi