Enregistré dans:
Détails bibliographiques
Auteurs principaux: Wang, Sha, Li, Yuchen, Xiao, Hanhua, Dai, Bing Tian, Lee, Roy Ka-Wei, Dong, Yanfei, Deng, Lambert
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2507.10897
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
Table des matières:
  • Schema matching is a foundational task in enterprise data integration, aiming to align disparate data sources. While traditional methods handle simple one-to-one table mappings, they often struggle with complex multi-table schema matching in real-world applications. We present LLMatch, a unified and modular schema matching framework. LLMatch decomposes schema matching into three distinct stages: schema preparation, table-candidate selection, and column-level alignment, enabling component-level evaluation and future-proof compatibility. It includes a novel two-stage optimization strategy: a Rollup module that consolidates semantically related columns into higher-order concepts, followed by a Drilldown module that re-expands these concepts for fine-grained column mapping. To address the scarcity of complex semantic matching benchmarks, we introduce SchemaNet, a benchmark derived from real-world schema pairs across three enterprise domains, designed to capture the challenges of multi-table schema alignment in practical settings. Experiments demonstrate that LLMatch significantly improves matching accuracy in complex schema matching settings and substantially boosts engineer productivity in real-world data integration.