Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Yanming, Eichler, Cédric, Anciaux, Nicolas, Bensamoun, Alexandra, Manzano, Lorena Gonzalez, Ghozzi, Seifeddine
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.09655
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914320384786432
author	Li, Yanming Eichler, Cédric Anciaux, Nicolas Bensamoun, Alexandra Manzano, Lorena Gonzalez Ghozzi, Seifeddine
author_facet	Li, Yanming Eichler, Cédric Anciaux, Nicolas Bensamoun, Alexandra Manzano, Lorena Gonzalez Ghozzi, Seifeddine
contents	We propose a system for marking sensitive or copyrighted texts to detect their use in fine-tuning large language models under black-box access with statistical guarantees. Our method builds digital ``marks'' using invisible Unicode characters organized into (``cue'', ``reply'') pairs. During an audit, prompts containing only ``cue'' fragments are issued to trigger regurgitation of the corresponding ``reply'', indicating document usage. To control false positives, we compare against held-out counterfactual marks and apply a ranking test, yielding a verifiable bound on the false positive rate. The approach is minimally invasive, scalable across many sources, robust to standard processing pipelines, and achieves high detection power even when marked data is a small fraction of the fine-tuning corpus.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_09655
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Data Provenance Auditing of Fine-Tuned Large Language Models with a Text-Preserving Technique Li, Yanming Eichler, Cédric Anciaux, Nicolas Bensamoun, Alexandra Manzano, Lorena Gonzalez Ghozzi, Seifeddine Cryptography and Security Artificial Intelligence We propose a system for marking sensitive or copyrighted texts to detect their use in fine-tuning large language models under black-box access with statistical guarantees. Our method builds digital ``marks'' using invisible Unicode characters organized into (``cue'', ``reply'') pairs. During an audit, prompts containing only ``cue'' fragments are issued to trigger regurgitation of the corresponding ``reply'', indicating document usage. To control false positives, we compare against held-out counterfactual marks and apply a ranking test, yielding a verifiable bound on the false positive rate. The approach is minimally invasive, scalable across many sources, robust to standard processing pipelines, and achieves high detection power even when marked data is a small fraction of the fine-tuning corpus.
title	Data Provenance Auditing of Fine-Tuned Large Language Models with a Text-Preserving Technique
topic	Cryptography and Security Artificial Intelligence
url	https://arxiv.org/abs/2510.09655

Similar Items