Saved in:
| Main Authors: | , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.16525 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Table of Contents:
- Large language model-generated code (LLMgCode) has become increasingly common in software development. So far LLMgCode has more quality issues than human-authored code (HaCode). It is common for LLMgCode to mix with HaCode in a code change, while the change is signed by only human developers, without being carefully examined. Many automated methods have been proposed to detect LLMgCode from HaCode, in which the perplexity-based method (PERPLEXITY for short) is the state-of-the-art method. However, the efficacy evaluation of PERPLEXITY has focused on detection accuracy. Yet it is unclear whether PERPLEXITY is good enough in a wider range of realistic evaluation settings. To this end, we carry out a family of experiments to compare PERPLEXITY against feature- and pre-training-based methods from three perspectives: detection accuracy, detection speed, and generalization capability. The experimental results show that PERPLEXITY has the best generalization capability while having limited detection accuracy and detection speed. Based on that, we discuss the strengths and limitations of PERPLEXITY, e.g., PERPLEXITY is unsuitable for high-level programming languages. Finally, we provide recommendations to improve PERPLEXITY and apply it in practice. As the first large-scale investigation on detecting LLMgCode from HaCode, this article provides a wide range of findings for future improvement.