Saved in:
Bibliographic Details
Main Authors: Guo, Chuanzhe, Wu, Jingjing, He, Sijun, Chen, Yang, Kuang, Zhaoqi, Fan, Shilong, Chen, Bingjin, Bao, Siqi, Liu, Jing, Wu, Hua, Zhu, Qingfu, Che, Wanxiang, Wang, Haifeng
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.22859
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915768327733248
author Guo, Chuanzhe
Wu, Jingjing
He, Sijun
Chen, Yang
Kuang, Zhaoqi
Fan, Shilong
Chen, Bingjin
Bao, Siqi
Liu, Jing
Wu, Hua
Zhu, Qingfu
Che, Wanxiang
Wang, Haifeng
author_facet Guo, Chuanzhe
Wu, Jingjing
He, Sijun
Chen, Yang
Kuang, Zhaoqi
Fan, Shilong
Chen, Bingjin
Bao, Siqi
Liu, Jing
Wu, Hua
Zhu, Qingfu
Che, Wanxiang
Wang, Haifeng
contents The evolution of Large Language Model (LLM) agents for software engineering (SWE) is constrained by the scarcity of verifiable datasets, a bottleneck stemming from the complexity of constructing executable environments across diverse languages. To address this, we introduce MEnvAgent, a Multi-language framework for automated Environment construction that facilitates scalable generation of verifiable task instances. MEnvAgent employs a multi-agent Planning-Execution-Verification architecture to autonomously resolve construction failures and integrates a novel Environment Reuse Mechanism that reduces computational overhead by incrementally patching historical environments. Evaluations on MEnvBench, a new benchmark comprising 1,000 tasks across 10 languages, demonstrate that MEnvAgent outperforms baselines, improving Fail-to-Pass (F2P) rates by 8.6% while reducing time costs by 43%. Additionally, we demonstrate the utility of MEnvAgent by constructing MEnvData-SWE, the largest open-source polyglot dataset of realistic verifiable Docker environments to date, alongside solution trajectories that enable consistent performance gains on SWE tasks across a wide range of models. Our code, benchmark, and dataset are available at https://github.com/ernie-research/MEnvAgent.
format Preprint
id arxiv_https___arxiv_org_abs_2601_22859
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering
Guo, Chuanzhe
Wu, Jingjing
He, Sijun
Chen, Yang
Kuang, Zhaoqi
Fan, Shilong
Chen, Bingjin
Bao, Siqi
Liu, Jing
Wu, Hua
Zhu, Qingfu
Che, Wanxiang
Wang, Haifeng
Software Engineering
Artificial Intelligence
The evolution of Large Language Model (LLM) agents for software engineering (SWE) is constrained by the scarcity of verifiable datasets, a bottleneck stemming from the complexity of constructing executable environments across diverse languages. To address this, we introduce MEnvAgent, a Multi-language framework for automated Environment construction that facilitates scalable generation of verifiable task instances. MEnvAgent employs a multi-agent Planning-Execution-Verification architecture to autonomously resolve construction failures and integrates a novel Environment Reuse Mechanism that reduces computational overhead by incrementally patching historical environments. Evaluations on MEnvBench, a new benchmark comprising 1,000 tasks across 10 languages, demonstrate that MEnvAgent outperforms baselines, improving Fail-to-Pass (F2P) rates by 8.6% while reducing time costs by 43%. Additionally, we demonstrate the utility of MEnvAgent by constructing MEnvData-SWE, the largest open-source polyglot dataset of realistic verifiable Docker environments to date, alongside solution trajectories that enable consistent performance gains on SWE tasks across a wide range of models. Our code, benchmark, and dataset are available at https://github.com/ernie-research/MEnvAgent.
title MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering
topic Software Engineering
Artificial Intelligence
url https://arxiv.org/abs/2601.22859