Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xian, Zixiang, Cui, Chenhui, Huang, Rubing, Fang, Chunrong, Chen, Zhenyu
Format:	Preprint
Published:	2024
Subjects:	Software Engineering Artificial Intelligence
Online Access:	https://arxiv.org/abs/2409.14644
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912409854148608
author	Xian, Zixiang Cui, Chenhui Huang, Rubing Fang, Chunrong Chen, Zhenyu
author_facet	Xian, Zixiang Cui, Chenhui Huang, Rubing Fang, Chunrong Chen, Zhenyu
contents	The advent of large language models (LLMs) has significantly advanced artificial intelligence (AI) in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code clustering. However, existing methods for source code embedding, including those based on LLMs, often rely on costly supervised training or fine-tuning for domain adaptation. This paper proposes a novel approach to embedding source code by combining large language and sentence embedding models. This approach attempts to eliminate the need for task-specific training or fine-tuning and to effectively address the issue of erroneous information commonly found in LLM-generated outputs. To evaluate the performance of our proposed approach, we conducted a series of experiments on three datasets with different programming languages by considering various LLMs and sentence embedding models. The experimental results have demonstrated the effectiveness and superiority of our approach over the state-of-the-art unsupervised approaches, such as SourcererCC, Code2vec, InferCode, TransformCode, and LLM2Vec. Our findings highlight the potential of our approach to advance the field of SE by providing robust and efficient solutions for source code embedding tasks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_14644
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models Xian, Zixiang Cui, Chenhui Huang, Rubing Fang, Chunrong Chen, Zhenyu Software Engineering Artificial Intelligence The advent of large language models (LLMs) has significantly advanced artificial intelligence (AI) in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code clustering. However, existing methods for source code embedding, including those based on LLMs, often rely on costly supervised training or fine-tuning for domain adaptation. This paper proposes a novel approach to embedding source code by combining large language and sentence embedding models. This approach attempts to eliminate the need for task-specific training or fine-tuning and to effectively address the issue of erroneous information commonly found in LLM-generated outputs. To evaluate the performance of our proposed approach, we conducted a series of experiments on three datasets with different programming languages by considering various LLMs and sentence embedding models. The experimental results have demonstrated the effectiveness and superiority of our approach over the state-of-the-art unsupervised approaches, such as SourcererCC, Code2vec, InferCode, TransformCode, and LLM2Vec. Our findings highlight the potential of our approach to advance the field of SE by providing robust and efficient solutions for source code embedding tasks.
title	An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models
topic	Software Engineering Artificial Intelligence
url	https://arxiv.org/abs/2409.14644

Similar Items