Enregistré dans:
Détails bibliographiques
Auteurs principaux: Masrour, Elyas, Emi, Bradley, Spero, Max
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2501.03437
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866909449796452352
author Masrour, Elyas
Emi, Bradley
Spero, Max
author_facet Masrour, Elyas
Emi, Bradley
Spero, Max
contents AI humanizers are a new class of online software tools meant to paraphrase and rewrite AI-generated text in a way that allows them to evade AI detection software. We study 19 AI humanizer and paraphrasing tools and qualitatively assess their effects and faithfulness in preserving the meaning of the original text. We show that many existing AI detectors fail to detect humanized text. Finally, we demonstrate a robust model that can detect humanized AI text while maintaining a low false positive rate using a data-centric augmentation approach. We attack our own detector, training our own fine-tuned model optimized against our detector's predictions, and show that our detector's cross-humanizer generalization is sufficient to remain robust to this attack.
format Preprint
id arxiv_https___arxiv_org_abs_2501_03437
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle DAMAGE: Detecting Adversarially Modified AI Generated Text
Masrour, Elyas
Emi, Bradley
Spero, Max
Computation and Language
AI humanizers are a new class of online software tools meant to paraphrase and rewrite AI-generated text in a way that allows them to evade AI detection software. We study 19 AI humanizer and paraphrasing tools and qualitatively assess their effects and faithfulness in preserving the meaning of the original text. We show that many existing AI detectors fail to detect humanized text. Finally, we demonstrate a robust model that can detect humanized AI text while maintaining a low false positive rate using a data-centric augmentation approach. We attack our own detector, training our own fine-tuned model optimized against our detector's predictions, and show that our detector's cross-humanizer generalization is sufficient to remain robust to this attack.
title DAMAGE: Detecting Adversarially Modified AI Generated Text
topic Computation and Language
url https://arxiv.org/abs/2501.03437