Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Boquan, Fu, Zirui, Zhang, Mengdi, Zhang, Peixin, Sun, Jun, Wang, Xingmei
Format:	Preprint
Published:	2024
Subjects:	Cryptography and Security
Online Access:	https://arxiv.org/abs/2402.07518
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915420724789248
author	Li, Boquan Fu, Zirui Zhang, Mengdi Zhang, Peixin Sun, Jun Wang, Xingmei
author_facet	Li, Boquan Fu, Zirui Zhang, Mengdi Zhang, Peixin Sun, Jun Wang, Xingmei
contents	Large language models (LLMs) have significantly enhanced the usability of AI-generated code, providing effective assistance to programmers. This advancement also raises ethical and legal concerns, such as academic dishonesty or the generation of malicious code. For accountability, it is imperative to detect whether a piece of code is AI-generated. Watermarking is broadly considered a promising solution and has been successfully applied to identify LLM-generated text. However, existing efforts on code are far from ideal, suffering from limited universality and excessive time and memory consumption. In this work, we propose a plug-and-play watermarking approach for AI-generated code detection, named ACW (AI Code Watermarking). ACW is training-free and works by selectively applying a set of carefully-designed, semantic-preserving and idempotent code transformations to LLM code outputs. The presence or absence of the transformations serves as implicit watermarks, enabling the detection of AI-generated code. Our experimental results show that ACW effectively detects AI-generated code, preserves code utility, and is resilient against code optimizations. Especially, ACW is efficient and is universal across different LLMs, addressing the limitations of existing approaches.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_07518
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Efficient and Universal Watermarking for LLM-Generated Code Detection Li, Boquan Fu, Zirui Zhang, Mengdi Zhang, Peixin Sun, Jun Wang, Xingmei Cryptography and Security Large language models (LLMs) have significantly enhanced the usability of AI-generated code, providing effective assistance to programmers. This advancement also raises ethical and legal concerns, such as academic dishonesty or the generation of malicious code. For accountability, it is imperative to detect whether a piece of code is AI-generated. Watermarking is broadly considered a promising solution and has been successfully applied to identify LLM-generated text. However, existing efforts on code are far from ideal, suffering from limited universality and excessive time and memory consumption. In this work, we propose a plug-and-play watermarking approach for AI-generated code detection, named ACW (AI Code Watermarking). ACW is training-free and works by selectively applying a set of carefully-designed, semantic-preserving and idempotent code transformations to LLM code outputs. The presence or absence of the transformations serves as implicit watermarks, enabling the detection of AI-generated code. Our experimental results show that ACW effectively detects AI-generated code, preserves code utility, and is resilient against code optimizations. Especially, ACW is efficient and is universal across different LLMs, addressing the limitations of existing approaches.
title	Efficient and Universal Watermarking for LLM-Generated Code Detection
topic	Cryptography and Security
url	https://arxiv.org/abs/2402.07518

Similar Items