Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Rodriguez-Cardenas, Daniel, Velasco, Alejandro, Poshyvanyk, Denys
Format:	Preprint
Published:	2025
Subjects:	Software Engineering Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2502.07046
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913692155641856
author	Rodriguez-Cardenas, Daniel Velasco, Alejandro Poshyvanyk, Denys
author_facet	Rodriguez-Cardenas, Daniel Velasco, Alejandro Poshyvanyk, Denys
contents	Language Models (LLMs), such as transformer-based neural networks trained on billions of parameters, have become increasingly prevalent in software engineering (SE). These models, trained on extensive datasets that include code repositories, exhibit remarkable capabilities for SE tasks. However, evaluating their effectiveness poses significant challenges, primarily due to the potential overlap between the datasets used for training and those employed for evaluation. To address this issue, we introduce SnipGen, a comprehensive repository mining framework designed to leverage prompt engineering across various downstream tasks for code generation. SnipGen aims to mitigate data contamination by generating robust testbeds and crafting tailored data points to assist researchers and practitioners in evaluating LLMs for code-related tasks. In our exploratory study, SnipGen mined approximately 227K data points from 338K recent code changes in GitHub commits, focusing on method-level granularity. SnipGen features a collection of prompt templates that can be combined to create a Chain-of-Thought-like sequence of prompts, enabling a nuanced assessment of LLMs' code generation quality. By providing the mining tool, the methodology, and the dataset, SnipGen empowers researchers and practitioners to rigorously evaluate and interpret LLMs' performance in software engineering contexts.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_07046
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	SnipGen: A Mining Repository Framework for Evaluating LLMs for Code Rodriguez-Cardenas, Daniel Velasco, Alejandro Poshyvanyk, Denys Software Engineering Artificial Intelligence Machine Learning Language Models (LLMs), such as transformer-based neural networks trained on billions of parameters, have become increasingly prevalent in software engineering (SE). These models, trained on extensive datasets that include code repositories, exhibit remarkable capabilities for SE tasks. However, evaluating their effectiveness poses significant challenges, primarily due to the potential overlap between the datasets used for training and those employed for evaluation. To address this issue, we introduce SnipGen, a comprehensive repository mining framework designed to leverage prompt engineering across various downstream tasks for code generation. SnipGen aims to mitigate data contamination by generating robust testbeds and crafting tailored data points to assist researchers and practitioners in evaluating LLMs for code-related tasks. In our exploratory study, SnipGen mined approximately 227K data points from 338K recent code changes in GitHub commits, focusing on method-level granularity. SnipGen features a collection of prompt templates that can be combined to create a Chain-of-Thought-like sequence of prompts, enabling a nuanced assessment of LLMs' code generation quality. By providing the mining tool, the methodology, and the dataset, SnipGen empowers researchers and practitioners to rigorously evaluate and interpret LLMs' performance in software engineering contexts.
title	SnipGen: A Mining Repository Framework for Evaluating LLMs for Code
topic	Software Engineering Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2502.07046

Similar Items