Saved in:
Bibliographic Details
Main Authors: Yousuf, Muhammad, Bagade, Akshat, Penugonda, Chhittebbayi, Baraya, Maanas
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2512.01141
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917115696513024
author Yousuf, Muhammad
Bagade, Akshat
Penugonda, Chhittebbayi
Baraya, Maanas
author_facet Yousuf, Muhammad
Bagade, Akshat
Penugonda, Chhittebbayi
Baraya, Maanas
contents Developers routinely work with source files whose variable names are generic or misleading, and with teams moving quickly, many functions are left undocumented. This slows comprehension, increases the risk of subtle bugs, and makes it harder for both humans and large language models (LLMs) to reason about code. We study variable name repair: given a real C++ function where all occurrences of one local or parameter name have been replaced by a placeholder (e.g. ID 1), the goal is to generate a natural, descriptive replacement name. We automatically construct this task from the C++ portion of BigCode's The Stack by parsing functions with Tree-sitter, masking a single identifier, and treating the original name as supervision. On top of Llama 3.1-8B, we build a pipeline with (i) warmup and dropout schedules for more stable fine-tuning, (ii) LoRA adapters for efficient specialization on identifier repair, and (iii) a dual-encoder reranker over top-k generator candidates. We evaluate using exact match, Top-5 Hit, and an embedding-based partial similarity score (0-100) that gives credit for near synonyms and format variants (e.g., jsonValue vs. json). On a held-out set of 200 C++ functions, a zero-shot Llama 3.1 baseline reaches 6.1 percent exact match. Our best LoRA-tuned model (with warmup and dropout) achieves 43.1 percent exact match, 50.2 percent Top-5 Hit, and an 82.03 partial-match score. A dual encoder reranker further improves selection quality without modifying the underlying generator, suggesting that task-specific fine-tuning plus reranking is a promising approach for practical identifier repair tools.
format Preprint
id arxiv_https___arxiv_org_abs_2512_01141
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Neural Variable Name Repair: Learning to Rename Identifiers for Readability
Yousuf, Muhammad
Bagade, Akshat
Penugonda, Chhittebbayi
Baraya, Maanas
Software Engineering
Machine Learning
Developers routinely work with source files whose variable names are generic or misleading, and with teams moving quickly, many functions are left undocumented. This slows comprehension, increases the risk of subtle bugs, and makes it harder for both humans and large language models (LLMs) to reason about code. We study variable name repair: given a real C++ function where all occurrences of one local or parameter name have been replaced by a placeholder (e.g. ID 1), the goal is to generate a natural, descriptive replacement name. We automatically construct this task from the C++ portion of BigCode's The Stack by parsing functions with Tree-sitter, masking a single identifier, and treating the original name as supervision. On top of Llama 3.1-8B, we build a pipeline with (i) warmup and dropout schedules for more stable fine-tuning, (ii) LoRA adapters for efficient specialization on identifier repair, and (iii) a dual-encoder reranker over top-k generator candidates. We evaluate using exact match, Top-5 Hit, and an embedding-based partial similarity score (0-100) that gives credit for near synonyms and format variants (e.g., jsonValue vs. json). On a held-out set of 200 C++ functions, a zero-shot Llama 3.1 baseline reaches 6.1 percent exact match. Our best LoRA-tuned model (with warmup and dropout) achieves 43.1 percent exact match, 50.2 percent Top-5 Hit, and an 82.03 partial-match score. A dual encoder reranker further improves selection quality without modifying the underlying generator, suggesting that task-specific fine-tuning plus reranking is a promising approach for practical identifier repair tools.
title Neural Variable Name Repair: Learning to Rename Identifiers for Readability
topic Software Engineering
Machine Learning
url https://arxiv.org/abs/2512.01141