Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yousuf, Muhammad, Bagade, Akshat, Penugonda, Chhittebbayi, Baraya, Maanas
Format:	Preprint
Published:	2025
Subjects:	Software Engineering Machine Learning
Online Access:	https://arxiv.org/abs/2512.01141
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917115696513024
author	Yousuf, Muhammad Bagade, Akshat Penugonda, Chhittebbayi Baraya, Maanas
author_facet	Yousuf, Muhammad Bagade, Akshat Penugonda, Chhittebbayi Baraya, Maanas
contents	Developers routinely work with source files whose variable names are generic or misleading, and with teams moving quickly, many functions are left undocumented. This slows comprehension, increases the risk of subtle bugs, and makes it harder for both humans and large language models (LLMs) to reason about code. We study variable name repair: given a real C++ function where all occurrences of one local or parameter name have been replaced by a placeholder (e.g. ID 1), the goal is to generate a natural, descriptive replacement name. We automatically construct this task from the C++ portion of BigCode's The Stack by parsing functions with Tree-sitter, masking a single identifier, and treating the original name as supervision. On top of Llama 3.1-8B, we build a pipeline with (i) warmup and dropout schedules for more stable fine-tuning, (ii) LoRA adapters for efficient specialization on identifier repair, and (iii) a dual-encoder reranker over top-k generator candidates. We evaluate using exact match, Top-5 Hit, and an embedding-based partial similarity score (0-100) that gives credit for near synonyms and format variants (e.g., jsonValue vs. json). On a held-out set of 200 C++ functions, a zero-shot Llama 3.1 baseline reaches 6.1 percent exact match. Our best LoRA-tuned model (with warmup and dropout) achieves 43.1 percent exact match, 50.2 percent Top-5 Hit, and an 82.03 partial-match score. A dual encoder reranker further improves selection quality without modifying the underlying generator, suggesting that task-specific fine-tuning plus reranking is a promising approach for practical identifier repair tools.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_01141
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Neural Variable Name Repair: Learning to Rename Identifiers for Readability Yousuf, Muhammad Bagade, Akshat Penugonda, Chhittebbayi Baraya, Maanas Software Engineering Machine Learning Developers routinely work with source files whose variable names are generic or misleading, and with teams moving quickly, many functions are left undocumented. This slows comprehension, increases the risk of subtle bugs, and makes it harder for both humans and large language models (LLMs) to reason about code. We study variable name repair: given a real C++ function where all occurrences of one local or parameter name have been replaced by a placeholder (e.g. ID 1), the goal is to generate a natural, descriptive replacement name. We automatically construct this task from the C++ portion of BigCode's The Stack by parsing functions with Tree-sitter, masking a single identifier, and treating the original name as supervision. On top of Llama 3.1-8B, we build a pipeline with (i) warmup and dropout schedules for more stable fine-tuning, (ii) LoRA adapters for efficient specialization on identifier repair, and (iii) a dual-encoder reranker over top-k generator candidates. We evaluate using exact match, Top-5 Hit, and an embedding-based partial similarity score (0-100) that gives credit for near synonyms and format variants (e.g., jsonValue vs. json). On a held-out set of 200 C++ functions, a zero-shot Llama 3.1 baseline reaches 6.1 percent exact match. Our best LoRA-tuned model (with warmup and dropout) achieves 43.1 percent exact match, 50.2 percent Top-5 Hit, and an 82.03 partial-match score. A dual encoder reranker further improves selection quality without modifying the underlying generator, suggesting that task-specific fine-tuning plus reranking is a promising approach for practical identifier repair tools.
title	Neural Variable Name Repair: Learning to Rename Identifiers for Readability
topic	Software Engineering Machine Learning
url	https://arxiv.org/abs/2512.01141

Similar Items