Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Yu, Gao, Lang, Yang, Mingxin, Xie, Yu, Chen, Ping, Zhang, Xiaojin, Chen, Wei
Format:	Preprint
Published:	2024
Subjects:	Cryptography and Security Artificial Intelligence Software Engineering
Online Access:	https://arxiv.org/abs/2406.07595
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914918465273856
author	Liu, Yu Gao, Lang Yang, Mingxin Xie, Yu Chen, Ping Zhang, Xiaojin Chen, Wei
author_facet	Liu, Yu Gao, Lang Yang, Mingxin Xie, Yu Chen, Ping Zhang, Xiaojin Chen, Wei
contents	Large Language Models (LLMs) have training corpora containing large amounts of program code, greatly improving the model's code comprehension and generation capabilities. However, sound comprehensive research on detecting program vulnerabilities, a more specific task related to code, and evaluating the performance of LLMs in this more specialized scenario is still lacking. To address common challenges in vulnerability analysis, our study introduces a new benchmark, VulDetectBench, specifically designed to assess the vulnerability detection capabilities of LLMs. The benchmark comprehensively evaluates LLM's ability to identify, classify, and locate vulnerabilities through five tasks of increasing difficulty. We evaluate the performance of 17 models (both open- and closed-source) and find that while existing models can achieve over 80% accuracy on tasks related to vulnerability identification and classification, they still fall short on specific, more detailed vulnerability analysis tasks, with less than 30% accuracy, making it difficult to provide valuable auxiliary information for professional vulnerability mining. Our benchmark effectively evaluates the capabilities of various LLMs at different levels in the specific task of vulnerability detection, providing a foundation for future research and improvements in this critical area of code security. VulDetectBench is publicly available at https://github.com/Sweetaroo/VulDetectBench.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_07595
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models Liu, Yu Gao, Lang Yang, Mingxin Xie, Yu Chen, Ping Zhang, Xiaojin Chen, Wei Cryptography and Security Artificial Intelligence Software Engineering Large Language Models (LLMs) have training corpora containing large amounts of program code, greatly improving the model's code comprehension and generation capabilities. However, sound comprehensive research on detecting program vulnerabilities, a more specific task related to code, and evaluating the performance of LLMs in this more specialized scenario is still lacking. To address common challenges in vulnerability analysis, our study introduces a new benchmark, VulDetectBench, specifically designed to assess the vulnerability detection capabilities of LLMs. The benchmark comprehensively evaluates LLM's ability to identify, classify, and locate vulnerabilities through five tasks of increasing difficulty. We evaluate the performance of 17 models (both open- and closed-source) and find that while existing models can achieve over 80% accuracy on tasks related to vulnerability identification and classification, they still fall short on specific, more detailed vulnerability analysis tasks, with less than 30% accuracy, making it difficult to provide valuable auxiliary information for professional vulnerability mining. Our benchmark effectively evaluates the capabilities of various LLMs at different levels in the specific task of vulnerability detection, providing a foundation for future research and improvements in this critical area of code security. VulDetectBench is publicly available at https://github.com/Sweetaroo/VulDetectBench.
title	VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models
topic	Cryptography and Security Artificial Intelligence Software Engineering
url	https://arxiv.org/abs/2406.07595

Similar Items