Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Reusser, Fredy
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence Computational Engineering, Finance, and Science
Online Access:	https://arxiv.org/abs/2403.19405
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910388981858304
author	Reusser, Fredy
author_facet	Reusser, Fredy
contents	Examining the effect of different encoding techniques on entity and context embeddings, the goal of this work is to challenge commonly used Ordinal encoding for tabular learning. Applying different preprocessing methods and network architectures over several datasets resulted in a benchmark on how the encoders influence the learning outcome of the networks. By keeping the test, validation and training data consistent, results have shown that ordinal encoding is not the most suited encoder for categorical data in terms of preprocessing the data and thereafter, classifying the target variable correctly. A better outcome was achieved, encoding the features based on string similarities by computing a similarity matrix as input for the network. This is the case for both, entity and context embeddings, where the transformer architecture showed improved performance for Ordinal and Similarity encoding with regard to multi-label classification tasks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_19405
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Tabular Learning: Encoding for Entity and Context Embeddings Reusser, Fredy Machine Learning Artificial Intelligence Computational Engineering, Finance, and Science Examining the effect of different encoding techniques on entity and context embeddings, the goal of this work is to challenge commonly used Ordinal encoding for tabular learning. Applying different preprocessing methods and network architectures over several datasets resulted in a benchmark on how the encoders influence the learning outcome of the networks. By keeping the test, validation and training data consistent, results have shown that ordinal encoding is not the most suited encoder for categorical data in terms of preprocessing the data and thereafter, classifying the target variable correctly. A better outcome was achieved, encoding the features based on string similarities by computing a similarity matrix as input for the network. This is the case for both, entity and context embeddings, where the transformer architecture showed improved performance for Ordinal and Similarity encoding with regard to multi-label classification tasks.
title	Tabular Learning: Encoding for Entity and Context Embeddings
topic	Machine Learning Artificial Intelligence Computational Engineering, Finance, and Science
url	https://arxiv.org/abs/2403.19405

Similar Items