Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Mbodji, Fatou Ndiaye, Diallo, El-hacen, Samhi, Jordan, Liu, Kui, Klein, Jacques, Bissyande, Tegawendé F.
Format:	Preprint
Publié:	2025
Sujets:	Software Engineering Artificial Intelligence
Accès en ligne:	https://arxiv.org/abs/2510.02166
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866911189183758336
author	Mbodji, Fatou Ndiaye Diallo, El-hacen Samhi, Jordan Liu, Kui Klein, Jacques Bissyande, Tegawendé F.
author_facet	Mbodji, Fatou Ndiaye Diallo, El-hacen Samhi, Jordan Liu, Kui Klein, Jacques Bissyande, Tegawendé F.
contents	Code agents and empirical software engineering rely on public code datasets, yet these datasets lack verifiable quality guarantees. Static 'dataset cards' inform, but they are neither auditable nor do they offer statistical guarantees, making it difficult to attest to dataset quality. Teams build isolated, ad-hoc cleaning pipelines. This fragments effort and raises cost. We present SIEVE, a community-driven framework. It turns per-property checks into Confidence Cards-machine-readable, verifiable certificates with anytime-valid statistical bounds. We outline a research plan to bring SIEVE to maturity, replacing narrative cards with anytime-verifiable certification. This shift is expected to lower quality-assurance costs and increase trust in code-datasets.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_02166
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	SIEVE: Towards Verifiable Certification for Code-datasets Mbodji, Fatou Ndiaye Diallo, El-hacen Samhi, Jordan Liu, Kui Klein, Jacques Bissyande, Tegawendé F. Software Engineering Artificial Intelligence Code agents and empirical software engineering rely on public code datasets, yet these datasets lack verifiable quality guarantees. Static 'dataset cards' inform, but they are neither auditable nor do they offer statistical guarantees, making it difficult to attest to dataset quality. Teams build isolated, ad-hoc cleaning pipelines. This fragments effort and raises cost. We present SIEVE, a community-driven framework. It turns per-property checks into Confidence Cards-machine-readable, verifiable certificates with anytime-valid statistical bounds. We outline a research plan to bring SIEVE to maturity, replacing narrative cards with anytime-verifiable certification. This shift is expected to lower quality-assurance costs and increase trust in code-datasets.
title	SIEVE: Towards Verifiable Certification for Code-datasets
topic	Software Engineering Artificial Intelligence
url	https://arxiv.org/abs/2510.02166

Documents similaires