Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Kim, Myung Jun, Grinsztajn, Léo, Varoquaux, Gaël
Formato:	Preprint
Publicado:	2024
Materias:	Machine Learning
Acceso en línea:	https://arxiv.org/abs/2402.16785
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866914817501036544
author	Kim, Myung Jun Grinsztajn, Léo Varoquaux, Gaël
author_facet	Kim, Myung Jun Grinsztajn, Léo Varoquaux, Gaël
contents	Pretrained deep-learning models are the go-to solution for images or text. However, for tabular data the standard is still to train tree-based models. Indeed, transfer learning on tables hits the challenge of data integration: finding correspondences, correspondences in the entries (entity matching) where different words may denote the same entity, correspondences across columns (schema matching), which may come in different orders, names... We propose a neural architecture that does not need such correspondences. As a result, we can pretrain it on background data that has not been matched. The architecture -- CARTE for Context Aware Representation of Table Entries -- uses a graph representation of tabular (or relational) data to process tables with different columns, string embedding of entries and columns names to model an open vocabulary, and a graph-attentional network to contextualize entries with column names and neighboring entries. An extensive benchmark shows that CARTE facilitates learning, outperforming a solid set of baselines including the best tree-based models. CARTE also enables joint learning across tables with unmatched columns, enhancing a small table with bigger ones. CARTE opens the door to large pretrained models for tabular data.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_16785
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	CARTE: Pretraining and Transfer for Tabular Learning Kim, Myung Jun Grinsztajn, Léo Varoquaux, Gaël Machine Learning Pretrained deep-learning models are the go-to solution for images or text. However, for tabular data the standard is still to train tree-based models. Indeed, transfer learning on tables hits the challenge of data integration: finding correspondences, correspondences in the entries (entity matching) where different words may denote the same entity, correspondences across columns (schema matching), which may come in different orders, names... We propose a neural architecture that does not need such correspondences. As a result, we can pretrain it on background data that has not been matched. The architecture -- CARTE for Context Aware Representation of Table Entries -- uses a graph representation of tabular (or relational) data to process tables with different columns, string embedding of entries and columns names to model an open vocabulary, and a graph-attentional network to contextualize entries with column names and neighboring entries. An extensive benchmark shows that CARTE facilitates learning, outperforming a solid set of baselines including the best tree-based models. CARTE also enables joint learning across tables with unmatched columns, enhancing a small table with bigger ones. CARTE opens the door to large pretrained models for tabular data.
title	CARTE: Pretraining and Transfer for Tabular Learning
topic	Machine Learning
url	https://arxiv.org/abs/2402.16785

Ejemplares similares