Saved in:
Bibliographic Details
Main Authors: Tang, Xunzhu, Chen, Zhenghan, Kim, Kisub, Tian, Haoye, Ezzini, Saad, Klein, Jacques
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2312.01241
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910713818120192
author Tang, Xunzhu
Chen, Zhenghan
Kim, Kisub
Tian, Haoye
Ezzini, Saad
Klein, Jacques
author_facet Tang, Xunzhu
Chen, Zhenghan
Kim, Kisub
Tian, Haoye
Ezzini, Saad
Klein, Jacques
contents Open-source code is pervasive. In this setting, embedded vulnerabilities are spreading to downstream software at an alarming rate. While such vulnerabilities are generally identified and addressed rapidly, inconsistent maintenance policies may lead security patches to go unnoticed. Indeed, security patches can be {\em silent}, i.e., they do not always come with comprehensive advisories such as CVEs. This lack of transparency leaves users oblivious to available security updates, providing ample opportunity for attackers to exploit unpatched vulnerabilities. Consequently, identifying silent security patches just in time when they are released is essential for preventing n-day attacks, and for ensuring robust and secure maintenance practices. With LLMDA we propose to (1) leverage large language models (LLMs) to augment patch information with generated code change explanations, (2) design a representation learning approach that explores code-text alignment methodologies for feature combination, (3) implement a label-wise training with labelled instructions for guiding the embedding based on security relevance, and (4) rely on a probabilistic batch contrastive learning mechanism for building a high-precision identifier of security patches. We evaluate LLMDA on the PatchDB and SPI-DB literature datasets and show that our approach substantially improves over the state-of-the-art, notably GraphSPD by 20% in terms of F-Measure on the SPI-DB benchmark.
format Preprint
id arxiv_https___arxiv_org_abs_2312_01241
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Just-in-Time Detection of Silent Security Patches
Tang, Xunzhu
Chen, Zhenghan
Kim, Kisub
Tian, Haoye
Ezzini, Saad
Klein, Jacques
Cryptography and Security
Artificial Intelligence
Open-source code is pervasive. In this setting, embedded vulnerabilities are spreading to downstream software at an alarming rate. While such vulnerabilities are generally identified and addressed rapidly, inconsistent maintenance policies may lead security patches to go unnoticed. Indeed, security patches can be {\em silent}, i.e., they do not always come with comprehensive advisories such as CVEs. This lack of transparency leaves users oblivious to available security updates, providing ample opportunity for attackers to exploit unpatched vulnerabilities. Consequently, identifying silent security patches just in time when they are released is essential for preventing n-day attacks, and for ensuring robust and secure maintenance practices. With LLMDA we propose to (1) leverage large language models (LLMs) to augment patch information with generated code change explanations, (2) design a representation learning approach that explores code-text alignment methodologies for feature combination, (3) implement a label-wise training with labelled instructions for guiding the embedding based on security relevance, and (4) rely on a probabilistic batch contrastive learning mechanism for building a high-precision identifier of security patches. We evaluate LLMDA on the PatchDB and SPI-DB literature datasets and show that our approach substantially improves over the state-of-the-art, notably GraphSPD by 20% in terms of F-Measure on the SPI-DB benchmark.
title Just-in-Time Detection of Silent Security Patches
topic Cryptography and Security
Artificial Intelligence
url https://arxiv.org/abs/2312.01241