Saved in:
Bibliographic Details
Main Authors: Pinto, Gustavo, Naves, Pedro Eduardo de Paula, Camargo, Ana Paula, Silva, Marselle
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.09805
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917400103878656
author Pinto, Gustavo
Naves, Pedro Eduardo de Paula
Camargo, Ana Paula
Silva, Marselle
author_facet Pinto, Gustavo
Naves, Pedro Eduardo de Paula
Camargo, Ana Paula
Silva, Marselle
contents Enterprise teams building internal coding agents face a gap between prototype performance and production readiness. The root cause is that technical model quality alone is insufficient -- tool design, safety enforcement, state management, and human trust calibration are equally decisive, yet underreported in the literature. We present CodeGen, an internal coding agent at Zup, and show that targeted tool design (e.g., string-replacement edits over full-file rewrites) and layered safety guardrails improved agent reliability more than prompt engineering, while progressive human oversight modes drove organic adoption without mandating trust. These findings suggest that the engineering decisions surrounding the model -- not the model itself -- determine whether a coding agent delivers real value in practice.
format Preprint
id arxiv_https___arxiv_org_abs_2604_09805
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Building an Internal Coding Agent at Zup: Lessons and Open Questions
Pinto, Gustavo
Naves, Pedro Eduardo de Paula
Camargo, Ana Paula
Silva, Marselle
Software Engineering
Enterprise teams building internal coding agents face a gap between prototype performance and production readiness. The root cause is that technical model quality alone is insufficient -- tool design, safety enforcement, state management, and human trust calibration are equally decisive, yet underreported in the literature. We present CodeGen, an internal coding agent at Zup, and show that targeted tool design (e.g., string-replacement edits over full-file rewrites) and layered safety guardrails improved agent reliability more than prompt engineering, while progressive human oversight modes drove organic adoption without mandating trust. These findings suggest that the engineering decisions surrounding the model -- not the model itself -- determine whether a coding agent delivers real value in practice.
title Building an Internal Coding Agent at Zup: Lessons and Open Questions
topic Software Engineering
url https://arxiv.org/abs/2604.09805