Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Masrour, Elyas, Emi, Bradley, Spero, Max
Format:	Preprint
Publié:	2025
Sujets:	Computation and Language
Accès en ligne:	https://arxiv.org/abs/2501.03437
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866909449796452352
author	Masrour, Elyas Emi, Bradley Spero, Max
author_facet	Masrour, Elyas Emi, Bradley Spero, Max
contents	AI humanizers are a new class of online software tools meant to paraphrase and rewrite AI-generated text in a way that allows them to evade AI detection software. We study 19 AI humanizer and paraphrasing tools and qualitatively assess their effects and faithfulness in preserving the meaning of the original text. We show that many existing AI detectors fail to detect humanized text. Finally, we demonstrate a robust model that can detect humanized AI text while maintaining a low false positive rate using a data-centric augmentation approach. We attack our own detector, training our own fine-tuned model optimized against our detector's predictions, and show that our detector's cross-humanizer generalization is sufficient to remain robust to this attack.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_03437
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	DAMAGE: Detecting Adversarially Modified AI Generated Text Masrour, Elyas Emi, Bradley Spero, Max Computation and Language AI humanizers are a new class of online software tools meant to paraphrase and rewrite AI-generated text in a way that allows them to evade AI detection software. We study 19 AI humanizer and paraphrasing tools and qualitatively assess their effects and faithfulness in preserving the meaning of the original text. We show that many existing AI detectors fail to detect humanized text. Finally, we demonstrate a robust model that can detect humanized AI text while maintaining a low false positive rate using a data-centric augmentation approach. We attack our own detector, training our own fine-tuned model optimized against our detector's predictions, and show that our detector's cross-humanizer generalization is sufficient to remain robust to this attack.
title	DAMAGE: Detecting Adversarially Modified AI Generated Text
topic	Computation and Language
url	https://arxiv.org/abs/2501.03437

Documents similaires