Tabla de Contenidos: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Oliva, Gustavo A., Rajbahadur, Gopi Krishnan, Bhatia, Aaditya, Zhang, Haoxiang, Chen, Yihao, Chen, Zhilong, Leung, Arthur, Lin, Dayi, Chen, Boyuan, Hassan, Ahmed E.
Formato:	Preprint
Publicado:	2025
Materias:	Software Engineering Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2507.09108
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Tabla de Contenidos:

High-quality labeled datasets are crucial for training and evaluating foundation models in software engineering, but creating them is often prohibitively expensive and labor-intensive. We introduce SPICE, a scalable, automated pipeline for labeling SWE-bench-style datasets with annotations for issue clarity, test coverage, and effort estimation. SPICE combines context-aware code navigation, rationale-driven prompting, and multi-pass consensus to produce labels that closely approximate expert annotations. SPICE's design was informed by our own experience and frustration in labeling more than 800 instances from SWE-Gym. SPICE achieves strong agreement with human-labeled SWE-bench Verified data while reducing the cost of labeling 1,000 instances from around \$100,000 (manual annotation) to just \$5.10. These results demonstrate SPICE's potential to enable cost-effective, large-scale dataset creation for SE-focused FMs. To support the community, we release both SPICE tool and SPICE Bench, a new dataset of 6,802 SPICE-labeled instances curated from 291 open-source projects in SWE-Gym (over 13x larger than SWE-bench Verified).

Ejemplares similares