Saved in:
Bibliographic Details
Main Authors: Bronselaer, Antoon, Acosta, Maribel
Format: Preprint
Published: 2022
Subjects:
Online Access:https://arxiv.org/abs/2202.12184
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • When combining data from multiple sources, inconsistent data complicates the production of a coherent result. In this paper, we introduce a new type of constraints called edit rules under a partial key (EPKs). These constraints can model inconsistencies both within and between sources, but in a loosely-coupled matter. We show that we can adapt the well-known set cover methodology to the setting of EPKs and this yields an efficient algorithm to find minimal cost repairs of sources. This algorithm is implemented in a repair engine called Parker. Empirical results show that Parker is several orders of magnitude faster than state-of-the-art repair tools. At the same time, the quality of the repairs in terms of $F_1$-score ranges from comparable to better compared to these tools.