Saved in:
Bibliographic Details
Main Authors: Li, Boquan, Fu, Zirui, Zhang, Mengdi, Zhang, Peixin, Sun, Jun, Wang, Xingmei
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.07518
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915420724789248
author Li, Boquan
Fu, Zirui
Zhang, Mengdi
Zhang, Peixin
Sun, Jun
Wang, Xingmei
author_facet Li, Boquan
Fu, Zirui
Zhang, Mengdi
Zhang, Peixin
Sun, Jun
Wang, Xingmei
contents Large language models (LLMs) have significantly enhanced the usability of AI-generated code, providing effective assistance to programmers. This advancement also raises ethical and legal concerns, such as academic dishonesty or the generation of malicious code. For accountability, it is imperative to detect whether a piece of code is AI-generated. Watermarking is broadly considered a promising solution and has been successfully applied to identify LLM-generated text. However, existing efforts on code are far from ideal, suffering from limited universality and excessive time and memory consumption. In this work, we propose a plug-and-play watermarking approach for AI-generated code detection, named ACW (AI Code Watermarking). ACW is training-free and works by selectively applying a set of carefully-designed, semantic-preserving and idempotent code transformations to LLM code outputs. The presence or absence of the transformations serves as implicit watermarks, enabling the detection of AI-generated code. Our experimental results show that ACW effectively detects AI-generated code, preserves code utility, and is resilient against code optimizations. Especially, ACW is efficient and is universal across different LLMs, addressing the limitations of existing approaches.
format Preprint
id arxiv_https___arxiv_org_abs_2402_07518
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Efficient and Universal Watermarking for LLM-Generated Code Detection
Li, Boquan
Fu, Zirui
Zhang, Mengdi
Zhang, Peixin
Sun, Jun
Wang, Xingmei
Cryptography and Security
Large language models (LLMs) have significantly enhanced the usability of AI-generated code, providing effective assistance to programmers. This advancement also raises ethical and legal concerns, such as academic dishonesty or the generation of malicious code. For accountability, it is imperative to detect whether a piece of code is AI-generated. Watermarking is broadly considered a promising solution and has been successfully applied to identify LLM-generated text. However, existing efforts on code are far from ideal, suffering from limited universality and excessive time and memory consumption. In this work, we propose a plug-and-play watermarking approach for AI-generated code detection, named ACW (AI Code Watermarking). ACW is training-free and works by selectively applying a set of carefully-designed, semantic-preserving and idempotent code transformations to LLM code outputs. The presence or absence of the transformations serves as implicit watermarks, enabling the detection of AI-generated code. Our experimental results show that ACW effectively detects AI-generated code, preserves code utility, and is resilient against code optimizations. Especially, ACW is efficient and is universal across different LLMs, addressing the limitations of existing approaches.
title Efficient and Universal Watermarking for LLM-Generated Code Detection
topic Cryptography and Security
url https://arxiv.org/abs/2402.07518