Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Cheng, Runxiang, Tufano, Michele, Cito, Jürgen, Cambronero, José, Rondon, Pat, Wei, Renyao, Sun, Aaron, Chandra, Satish
Format:	Preprint
Published:	2025
Subjects:	Software Engineering Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.01821
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910869270560768
author	Cheng, Runxiang Tufano, Michele Cito, Jürgen Cambronero, José Rondon, Pat Wei, Renyao Sun, Aaron Chandra, Satish
author_facet	Cheng, Runxiang Tufano, Michele Cito, Jürgen Cambronero, José Rondon, Pat Wei, Renyao Sun, Aaron Chandra, Satish
contents	Bug reports often lack sufficient detail for developers to reproduce and fix the underlying defects. Bug Reproduction Tests (BRTs), tests that fail when the bug is present and pass when it has been resolved, are crucial for debugging, but they are rarely included in bug reports, both in open-source and in industrial settings. Thus, automatically generating BRTs from bug reports has the potential to accelerate the debugging process and lower time to repair. This paper investigates automated BRT generation within an industry setting, specifically at Google, focusing on the challenges of a large-scale, proprietary codebase and considering real-world industry bugs extracted from Google's internal issue tracker. We adapt and evaluate a state-of-the-art BRT generation technique, LIBRO, and present our agent-based approach, BRT Agent, which makes use of a fine-tuned Large Language Model (LLM) for code editing. Our BRT Agent significantly outperforms LIBRO, achieving a 28% plausible BRT generation rate, compared to 10% by LIBRO, on 80 human-reported bugs from Google's internal issue tracker. We further investigate the practical value of generated BRTs by integrating them with an Automated Program Repair (APR) system at Google. Our results show that providing BRTs to the APR system results in 30% more bugs with plausible fixes. Additionally, we introduce Ensemble Pass Rate (EPR), a metric which leverages the generated BRTs to select the most promising fixes from all fixes generated by APR system. Our evaluation on EPR for Top-K and threshold-based fix selections demonstrates promising results and trade-offs. For example, EPR correctly selects a plausible fix from a pool of 20 candidates in 70% of cases, based on its top-1 ranking.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_01821
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Agentic Bug Reproduction for Effective Automated Program Repair at Google Cheng, Runxiang Tufano, Michele Cito, Jürgen Cambronero, José Rondon, Pat Wei, Renyao Sun, Aaron Chandra, Satish Software Engineering Artificial Intelligence Bug reports often lack sufficient detail for developers to reproduce and fix the underlying defects. Bug Reproduction Tests (BRTs), tests that fail when the bug is present and pass when it has been resolved, are crucial for debugging, but they are rarely included in bug reports, both in open-source and in industrial settings. Thus, automatically generating BRTs from bug reports has the potential to accelerate the debugging process and lower time to repair. This paper investigates automated BRT generation within an industry setting, specifically at Google, focusing on the challenges of a large-scale, proprietary codebase and considering real-world industry bugs extracted from Google's internal issue tracker. We adapt and evaluate a state-of-the-art BRT generation technique, LIBRO, and present our agent-based approach, BRT Agent, which makes use of a fine-tuned Large Language Model (LLM) for code editing. Our BRT Agent significantly outperforms LIBRO, achieving a 28% plausible BRT generation rate, compared to 10% by LIBRO, on 80 human-reported bugs from Google's internal issue tracker. We further investigate the practical value of generated BRTs by integrating them with an Automated Program Repair (APR) system at Google. Our results show that providing BRTs to the APR system results in 30% more bugs with plausible fixes. Additionally, we introduce Ensemble Pass Rate (EPR), a metric which leverages the generated BRTs to select the most promising fixes from all fixes generated by APR system. Our evaluation on EPR for Top-K and threshold-based fix selections demonstrates promising results and trade-offs. For example, EPR correctly selects a plausible fix from a pool of 20 candidates in 70% of cases, based on its top-1 ranking.
title	Agentic Bug Reproduction for Effective Automated Program Repair at Google
topic	Software Engineering Artificial Intelligence
url	https://arxiv.org/abs/2502.01821

Similar Items