Saved in:
Bibliographic Details
Main Authors: Jung, Minseok, Panizo, Cynthia Fuertes, Dugan, Liam, R., Yi, Fung, Chen, Pin-Yu, Liang, Paul Pu
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.04528
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911415059611648
author Jung, Minseok
Panizo, Cynthia Fuertes
Dugan, Liam
R., Yi
Fung
Chen, Pin-Yu
Liang, Paul Pu
author_facet Jung, Minseok
Panizo, Cynthia Fuertes
Dugan, Liam
R., Yi
Fung
Chen, Pin-Yu
Liang, Paul Pu
contents The advancement of large language models (LLMs) has made it difficult to differentiate human-written text from AI-generated text. Several AI-text detectors have been developed in response, which typically utilize a fixed global threshold (e.g., $θ= 0.5$) to classify machine-generated text. However, one universal threshold could fail to account for distributional variations by subgroups. For example, when using a fixed threshold, detectors make more false positive errors on shorter human-written text, and more positive classifications of neurotic writing styles among long texts. These discrepancies can lead to misclassifications that disproportionately affect certain groups. We address this critical limitation by introducing FairOPT, an algorithm for group-specific threshold optimization for probabilistic AI-text detectors. We partitioned data into subgroups based on attributes (e.g., text length and writing style) and implemented FairOPT to learn decision thresholds for each group to reduce discrepancy. FairOPT showed notable discrepancy mitigation across nine detectors and three heterogeneous datasets, and the remarkable mitigation of the minimax problem by decreasing overall discrepancy 27.4% across five metrics while minimally sacrificing accuracy by 0.005%. Our framework paves the way for more robust classification in AI-generated content detection via post-processing. We release our data, code, and project information at URL.
format Preprint
id arxiv_https___arxiv_org_abs_2502_04528
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection
Jung, Minseok
Panizo, Cynthia Fuertes
Dugan, Liam
R., Yi
Fung
Chen, Pin-Yu
Liang, Paul Pu
Computation and Language
Machine Learning
The advancement of large language models (LLMs) has made it difficult to differentiate human-written text from AI-generated text. Several AI-text detectors have been developed in response, which typically utilize a fixed global threshold (e.g., $θ= 0.5$) to classify machine-generated text. However, one universal threshold could fail to account for distributional variations by subgroups. For example, when using a fixed threshold, detectors make more false positive errors on shorter human-written text, and more positive classifications of neurotic writing styles among long texts. These discrepancies can lead to misclassifications that disproportionately affect certain groups. We address this critical limitation by introducing FairOPT, an algorithm for group-specific threshold optimization for probabilistic AI-text detectors. We partitioned data into subgroups based on attributes (e.g., text length and writing style) and implemented FairOPT to learn decision thresholds for each group to reduce discrepancy. FairOPT showed notable discrepancy mitigation across nine detectors and three heterogeneous datasets, and the remarkable mitigation of the minimax problem by decreasing overall discrepancy 27.4% across five metrics while minimally sacrificing accuracy by 0.005%. Our framework paves the way for more robust classification in AI-generated content detection via post-processing. We release our data, code, and project information at URL.
title Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection
topic Computation and Language
Machine Learning
url https://arxiv.org/abs/2502.04528