Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Oliva, Gustavo A., Rajbahadur, Gopi Krishnan, Bhatia, Aaditya, Zhang, Haoxiang, Chen, Yihao, Chen, Zhilong, Leung, Arthur, Lin, Dayi, Chen, Boyuan, Hassan, Ahmed E.
Format:	Preprint
Published:	2025
Subjects:	Software Engineering Artificial Intelligence
Online Access:	https://arxiv.org/abs/2507.09108
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916955959590912
author	Oliva, Gustavo A. Rajbahadur, Gopi Krishnan Bhatia, Aaditya Zhang, Haoxiang Chen, Yihao Chen, Zhilong Leung, Arthur Lin, Dayi Chen, Boyuan Hassan, Ahmed E.
author_facet	Oliva, Gustavo A. Rajbahadur, Gopi Krishnan Bhatia, Aaditya Zhang, Haoxiang Chen, Yihao Chen, Zhilong Leung, Arthur Lin, Dayi Chen, Boyuan Hassan, Ahmed E.
contents	High-quality labeled datasets are crucial for training and evaluating foundation models in software engineering, but creating them is often prohibitively expensive and labor-intensive. We introduce SPICE, a scalable, automated pipeline for labeling SWE-bench-style datasets with annotations for issue clarity, test coverage, and effort estimation. SPICE combines context-aware code navigation, rationale-driven prompting, and multi-pass consensus to produce labels that closely approximate expert annotations. SPICE's design was informed by our own experience and frustration in labeling more than 800 instances from SWE-Gym. SPICE achieves strong agreement with human-labeled SWE-bench Verified data while reducing the cost of labeling 1,000 instances from around \$100,000 (manual annotation) to just \$5.10. These results demonstrate SPICE's potential to enable cost-effective, large-scale dataset creation for SE-focused FMs. To support the community, we release both SPICE tool and SPICE Bench, a new dataset of 6,802 SPICE-labeled instances curated from 291 open-source projects in SWE-Gym (over 13x larger than SWE-bench Verified).
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_09108
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation Oliva, Gustavo A. Rajbahadur, Gopi Krishnan Bhatia, Aaditya Zhang, Haoxiang Chen, Yihao Chen, Zhilong Leung, Arthur Lin, Dayi Chen, Boyuan Hassan, Ahmed E. Software Engineering Artificial Intelligence High-quality labeled datasets are crucial for training and evaluating foundation models in software engineering, but creating them is often prohibitively expensive and labor-intensive. We introduce SPICE, a scalable, automated pipeline for labeling SWE-bench-style datasets with annotations for issue clarity, test coverage, and effort estimation. SPICE combines context-aware code navigation, rationale-driven prompting, and multi-pass consensus to produce labels that closely approximate expert annotations. SPICE's design was informed by our own experience and frustration in labeling more than 800 instances from SWE-Gym. SPICE achieves strong agreement with human-labeled SWE-bench Verified data while reducing the cost of labeling 1,000 instances from around \$100,000 (manual annotation) to just \$5.10. These results demonstrate SPICE's potential to enable cost-effective, large-scale dataset creation for SE-focused FMs. To support the community, we release both SPICE tool and SPICE Bench, a new dataset of 6,802 SPICE-labeled instances curated from 291 open-source projects in SWE-Gym (over 13x larger than SWE-bench Verified).
title	SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation
topic	Software Engineering Artificial Intelligence
url	https://arxiv.org/abs/2507.09108

Similar Items