Saved in:
Bibliographic Details
Main Authors: Pham, Dung, Ghaleb, Taher A.
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.17627
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917221190598656
author Pham, Dung
Ghaleb, Taher A.
author_facet Pham, Dung
Ghaleb, Taher A.
contents AI coding agents can autonomously generate pull requests (PRs), yet little is known about how their contributions compare to those of humans. We analyze 33,596 agent-generated PRs (APRs) and 6,618 human PRs (HPRs) to compare code-change characteristics and message quality. We observe that APR-introduced symbols (functions and classes) are removed much sooner than those in HPRs (median time to removal 3 vs. 34 days) and are also removed more often (symbol churn 7.33% vs. 4.10%), reflecting a focus on other tasks like documentation and test updates. Agents generate stronger commit-level messages (semantic similarity 0.72 vs. 0.68) but lag humans at PR-level summarization (PR-commit similarity 0.86 vs. 0.88). Commit message length is the best predictor of description quality, indicating reliance on individual commits over full-PR reasoning. These findings highlight a gap between agents' micro-level precision and macro-level communication, suggesting opportunities to improve agent-driven development workflows.
format Preprint
id arxiv_https___arxiv_org_abs_2601_17627
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Code Change Characteristics and Description Alignment: A Comparative Study of Agentic versus Human Pull Requests
Pham, Dung
Ghaleb, Taher A.
Software Engineering
AI coding agents can autonomously generate pull requests (PRs), yet little is known about how their contributions compare to those of humans. We analyze 33,596 agent-generated PRs (APRs) and 6,618 human PRs (HPRs) to compare code-change characteristics and message quality. We observe that APR-introduced symbols (functions and classes) are removed much sooner than those in HPRs (median time to removal 3 vs. 34 days) and are also removed more often (symbol churn 7.33% vs. 4.10%), reflecting a focus on other tasks like documentation and test updates. Agents generate stronger commit-level messages (semantic similarity 0.72 vs. 0.68) but lag humans at PR-level summarization (PR-commit similarity 0.86 vs. 0.88). Commit message length is the best predictor of description quality, indicating reliance on individual commits over full-PR reasoning. These findings highlight a gap between agents' micro-level precision and macro-level communication, suggesting opportunities to improve agent-driven development workflows.
title Code Change Characteristics and Description Alignment: A Comparative Study of Agentic versus Human Pull Requests
topic Software Engineering
url https://arxiv.org/abs/2601.17627