Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Xiang, Zhou, Yucheng, Zhao, Laiping, Li, Jing, Liu, Fangming
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2412.01413
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917854183424000
author	Li, Xiang Zhou, Yucheng Zhao, Laiping Li, Jing Liu, Fangming
author_facet	Li, Xiang Zhou, Yucheng Zhao, Laiping Li, Jing Liu, Fangming
contents	Detecting euphemisms is essential for content security on various social media platforms, but existing methods designed for detecting euphemisms are ineffective in impromptu euphemisms. In this work, we make a first attempt to an exploration of impromptu euphemism detection and introduce the Impromptu Cybercrime Euphemisms Detection (ICED) dataset. Moreover, we propose a detection framework tailored to this problem, which employs context augmentation modeling and multi-round iterative training. Our detection framework mainly consists of a coarse-grained and a fine-grained classification model. The coarse-grained classification model removes most of the harmless content in the corpus to be detected. The fine-grained model, impromptu euphemisms detector, integrates context augmentation and multi-round iterations training to better predicts the actual meaning of a masked token. In addition, we leverage ChatGPT to evaluate the mode's capability. Experimental results demonstrate that our approach achieves a remarkable 76-fold improvement compared to the previous state-of-the-art euphemism detector.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_01413
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Impromptu Cybercrime Euphemism Detection Li, Xiang Zhou, Yucheng Zhao, Laiping Li, Jing Liu, Fangming Computation and Language Detecting euphemisms is essential for content security on various social media platforms, but existing methods designed for detecting euphemisms are ineffective in impromptu euphemisms. In this work, we make a first attempt to an exploration of impromptu euphemism detection and introduce the Impromptu Cybercrime Euphemisms Detection (ICED) dataset. Moreover, we propose a detection framework tailored to this problem, which employs context augmentation modeling and multi-round iterative training. Our detection framework mainly consists of a coarse-grained and a fine-grained classification model. The coarse-grained classification model removes most of the harmless content in the corpus to be detected. The fine-grained model, impromptu euphemisms detector, integrates context augmentation and multi-round iterations training to better predicts the actual meaning of a masked token. In addition, we leverage ChatGPT to evaluate the mode's capability. Experimental results demonstrate that our approach achieves a remarkable 76-fold improvement compared to the previous state-of-the-art euphemism detector.
title	Impromptu Cybercrime Euphemism Detection
topic	Computation and Language
url	https://arxiv.org/abs/2412.01413

Similar Items