Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Clemens-Sewall, Mary Versa, Cervantes, Christopher, Rafkin, Emma, Otte, J. Neil, Magelinski, Tom, Lewis, Libby, Liu, Michelle, Udwin, Dana, Kirkman-Bey, Monique
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2508.14741
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913998693203968
author	Clemens-Sewall, Mary Versa Cervantes, Christopher Rafkin, Emma Otte, J. Neil Magelinski, Tom Lewis, Libby Liu, Michelle Udwin, Dana Kirkman-Bey, Monique
author_facet	Clemens-Sewall, Mary Versa Cervantes, Christopher Rafkin, Emma Otte, J. Neil Magelinski, Tom Lewis, Libby Liu, Michelle Udwin, Dana Kirkman-Bey, Monique
contents	This report provides practical guidance to teams designing or developing AI-enabled systems for how to promote trustworthiness during the data curation phase of development. In this report, the authors first define data, the data curation phase, and trustworthiness. We then describe a series of steps that the development team, especially data scientists, can take to build a trustworthy AI-enabled system. We enumerate the sequence of core steps and trace parallel paths where alternatives exist. The descriptions of these steps include strengths, weaknesses, preconditions, outcomes, and relevant open-source software tool implementations. In total, this report is a synthesis of data curation tools and approaches from relevant academic literature, and our goal is to equip readers with a diverse yet coherent set of practices for improving AI trustworthiness.
format	Preprint
id	arxiv_https___arxiv_org_abs_2508_14741
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	CaTE Data Curation for Trustworthy AI Clemens-Sewall, Mary Versa Cervantes, Christopher Rafkin, Emma Otte, J. Neil Magelinski, Tom Lewis, Libby Liu, Michelle Udwin, Dana Kirkman-Bey, Monique Machine Learning This report provides practical guidance to teams designing or developing AI-enabled systems for how to promote trustworthiness during the data curation phase of development. In this report, the authors first define data, the data curation phase, and trustworthiness. We then describe a series of steps that the development team, especially data scientists, can take to build a trustworthy AI-enabled system. We enumerate the sequence of core steps and trace parallel paths where alternatives exist. The descriptions of these steps include strengths, weaknesses, preconditions, outcomes, and relevant open-source software tool implementations. In total, this report is a synthesis of data curation tools and approaches from relevant academic literature, and our goal is to equip readers with a diverse yet coherent set of practices for improving AI trustworthiness.
title	CaTE Data Curation for Trustworthy AI
topic	Machine Learning
url	https://arxiv.org/abs/2508.14741

Similar Items