Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Gehao, Bagdasarian, Eugene, Zhai, Juan, Ma, Shiqing
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security Artificial Intelligence
Online Access:	https://arxiv.org/abs/2507.05512
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908440298782720
author	Zhang, Gehao Bagdasarian, Eugene Zhai, Juan Ma, Shiqing
author_facet	Zhang, Gehao Bagdasarian, Eugene Zhai, Juan Ma, Shiqing
contents	Distinguishing AI-generated code from human-written code is becoming crucial for tasks such as authorship attribution, content tracking, and misuse detection. Based on this, N-gram-based watermarking schemes have emerged as prominent, which inject secret watermarks to be detected during the generation. However, their robustness in code content remains insufficiently evaluated. Most claims rely solely on defenses against simple code transformations or code optimizations as a simulation of attack, creating a questionable sense of robustness. In contrast, more sophisticated schemes already exist in the software engineering world, e.g., code obfuscation, which significantly alters code while preserving functionality. Although obfuscation is commonly used to protect intellectual property or evade software scanners, the robustness of code watermarking techniques against such transformations remains largely unexplored. In this work, we formally model the code obfuscation and prove the impossibility of N-gram-based watermarking's robustness with only one intuitive and experimentally verified assumption, distribution consistency, satisfied. Given the original false positive rate of the watermarking detection, the ratio that the detector failed on the watermarked code after obfuscation will increase to 1 - fpr. The experiments have been performed on three SOTA watermarking schemes, two LLMs, two programming languages, four code benchmarks, and four obfuscators. Among them, all watermarking detectors show coin-flipping detection abilities on obfuscated codes (AUROC tightly surrounds 0.5). Among all models, watermarking schemes, and datasets, both programming languages own obfuscators that can achieve attack effects with no detection AUROC higher than 0.6 after the attack. Based on the theoretical and practical observations, we also proposed a potential path of robust code watermarking.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_05512
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Disappearing Ink: Obfuscation Breaks N-gram Code Watermarks in Theory and Practice Zhang, Gehao Bagdasarian, Eugene Zhai, Juan Ma, Shiqing Cryptography and Security Artificial Intelligence Distinguishing AI-generated code from human-written code is becoming crucial for tasks such as authorship attribution, content tracking, and misuse detection. Based on this, N-gram-based watermarking schemes have emerged as prominent, which inject secret watermarks to be detected during the generation. However, their robustness in code content remains insufficiently evaluated. Most claims rely solely on defenses against simple code transformations or code optimizations as a simulation of attack, creating a questionable sense of robustness. In contrast, more sophisticated schemes already exist in the software engineering world, e.g., code obfuscation, which significantly alters code while preserving functionality. Although obfuscation is commonly used to protect intellectual property or evade software scanners, the robustness of code watermarking techniques against such transformations remains largely unexplored. In this work, we formally model the code obfuscation and prove the impossibility of N-gram-based watermarking's robustness with only one intuitive and experimentally verified assumption, distribution consistency, satisfied. Given the original false positive rate of the watermarking detection, the ratio that the detector failed on the watermarked code after obfuscation will increase to 1 - fpr. The experiments have been performed on three SOTA watermarking schemes, two LLMs, two programming languages, four code benchmarks, and four obfuscators. Among them, all watermarking detectors show coin-flipping detection abilities on obfuscated codes (AUROC tightly surrounds 0.5). Among all models, watermarking schemes, and datasets, both programming languages own obfuscators that can achieve attack effects with no detection AUROC higher than 0.6 after the attack. Based on the theoretical and practical observations, we also proposed a potential path of robust code watermarking.
title	Disappearing Ink: Obfuscation Breaks N-gram Code Watermarks in Theory and Practice
topic	Cryptography and Security Artificial Intelligence
url	https://arxiv.org/abs/2507.05512

Similar Items