Saved in:
Bibliographic Details
Main Authors: Pang, Qi, Hu, Shengyuan, Zheng, Wenting, Smith, Virginia
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.16187
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912117279424512
author Pang, Qi
Hu, Shengyuan
Zheng, Wenting
Smith, Virginia
author_facet Pang, Qi
Hu, Shengyuan
Zheng, Wenting
Smith, Virginia
contents Advances in generative models have made it possible for AI-generated text, code, and images to mirror human-generated content in many applications. Watermarking, a technique that aims to embed information in the output of a model to verify its source, is useful for mitigating the misuse of such AI-generated content. However, we show that common design choices in LLM watermarking schemes make the resulting systems surprisingly susceptible to attack -- leading to fundamental trade-offs in robustness, utility, and usability. To navigate these trade-offs, we rigorously study a set of simple yet effective attacks on common watermarking systems, and propose guidelines and defenses for LLM watermarking in practice.
format Preprint
id arxiv_https___arxiv_org_abs_2402_16187
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices
Pang, Qi
Hu, Shengyuan
Zheng, Wenting
Smith, Virginia
Cryptography and Security
Computation and Language
Machine Learning
Advances in generative models have made it possible for AI-generated text, code, and images to mirror human-generated content in many applications. Watermarking, a technique that aims to embed information in the output of a model to verify its source, is useful for mitigating the misuse of such AI-generated content. However, we show that common design choices in LLM watermarking schemes make the resulting systems surprisingly susceptible to attack -- leading to fundamental trade-offs in robustness, utility, and usability. To navigate these trade-offs, we rigorously study a set of simple yet effective attacks on common watermarking systems, and propose guidelines and defenses for LLM watermarking in practice.
title No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices
topic Cryptography and Security
Computation and Language
Machine Learning
url https://arxiv.org/abs/2402.16187