Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zheng, Jingyi, Wang, Junfeng, Sun, Zhen, Dong, Wenhan, Liu, Yule, He, Xinlei
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security Artificial Intelligence
Online Access:	https://arxiv.org/abs/2503.08708
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917955555557376
author	Zheng, Jingyi Wang, Junfeng Sun, Zhen Dong, Wenhan Liu, Yule He, Xinlei
author_facet	Zheng, Jingyi Wang, Junfeng Sun, Zhen Dong, Wenhan Liu, Yule He, Xinlei
contents	As Large Language Models (LLMs) advance, Machine-Generated Texts (MGTs) have become increasingly fluent, high-quality, and informative. Existing wide-range MGT detectors are designed to identify MGTs to prevent the spread of plagiarism and misinformation. However, adversaries attempt to humanize MGTs to evade detection (named evading attacks), which requires only minor modifications to bypass MGT detectors. Unfortunately, existing attacks generally lack a unified and comprehensive evaluation framework, as they are assessed using different experimental settings, model architectures, and datasets. To fill this gap, we introduce the Text-Humanization Benchmark (TH-Bench), the first comprehensive benchmark to evaluate evading attacks against MGT detectors. TH-Bench evaluates attacks across three key dimensions: evading effectiveness, text quality, and computational overhead. Our extensive experiments evaluate 6 state-of-the-art attacks against 13 MGT detectors across 6 datasets, spanning 19 domains and generated by 11 widely used LLMs. Our findings reveal that no single evading attack excels across all three dimensions. Through in-depth analysis, we highlight the strengths and limitations of different attacks. More importantly, we identify a trade-off among three dimensions and propose two optimization insights. Through preliminary experiments, we validate their correctness and effectiveness, offering potential directions for future research.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_08708
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	TH-Bench: Evaluating Evading Attacks via Humanizing AI Text on Machine-Generated Text Detectors Zheng, Jingyi Wang, Junfeng Sun, Zhen Dong, Wenhan Liu, Yule He, Xinlei Cryptography and Security Artificial Intelligence As Large Language Models (LLMs) advance, Machine-Generated Texts (MGTs) have become increasingly fluent, high-quality, and informative. Existing wide-range MGT detectors are designed to identify MGTs to prevent the spread of plagiarism and misinformation. However, adversaries attempt to humanize MGTs to evade detection (named evading attacks), which requires only minor modifications to bypass MGT detectors. Unfortunately, existing attacks generally lack a unified and comprehensive evaluation framework, as they are assessed using different experimental settings, model architectures, and datasets. To fill this gap, we introduce the Text-Humanization Benchmark (TH-Bench), the first comprehensive benchmark to evaluate evading attacks against MGT detectors. TH-Bench evaluates attacks across three key dimensions: evading effectiveness, text quality, and computational overhead. Our extensive experiments evaluate 6 state-of-the-art attacks against 13 MGT detectors across 6 datasets, spanning 19 domains and generated by 11 widely used LLMs. Our findings reveal that no single evading attack excels across all three dimensions. Through in-depth analysis, we highlight the strengths and limitations of different attacks. More importantly, we identify a trade-off among three dimensions and propose two optimization insights. Through preliminary experiments, we validate their correctness and effectiveness, offering potential directions for future research.
title	TH-Bench: Evaluating Evading Attacks via Humanizing AI Text on Machine-Generated Text Detectors
topic	Cryptography and Security Artificial Intelligence
url	https://arxiv.org/abs/2503.08708

Similar Items