Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fdez-Díaz, Laura, Tomillo, Sara González, Montañés, Elena, Quevedo, José Ramón
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2402.01450
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929231676571648
author	Fdez-Díaz, Laura Tomillo, Sara González Montañés, Elena Quevedo, José Ramón
author_facet	Fdez-Díaz, Laura Tomillo, Sara González Montañés, Elena Quevedo, José Ramón
contents	In traditional Machine Learning, the algorithms predictions are based on the assumption that the data follows the same distribution in both the training and the test datasets. However, in real world data this condition does not hold and, for instance, the distribution of the covariates changes whereas the conditional distribution of the targets remains unchanged. This situation is called covariate shift problem where standard error estimation may be no longer accurate. In this context, the importance is a measure commonly used to alleviate the influence of covariate shift on error estimations. The main drawback is that it is not easy to compute. The Kullback-Leibler Importance Estimation Procedure (KLIEP) is capable of estimating importance in a promising way. Despite its good performance, it fails to ignore target information, since it only includes the covariates information for computing the importance. In this direction, this paper explores the potential performance improvement if target information is considered in the computation of the importance. Then, a redefinition of the importance arises in order to be generalized in this way. Besides the potential improvement in performance, including target information make possible the application to a real application about plankton classification that motivates this research and characterized by its great dimensionality, since considering targets rather than covariates reduces the computation and the noise in the covariates. The impact of taking target information is also explored when Logistic Regression (LR), Kernel Mean Matching (KMM), Ensemble Kernel Mean Matching (EKMM) and the naive predecessor of KLIEP called Kernel Density Estimation (KDE) methods estimate the importance. The experimental results lead to a more accurate error estimation using target information, especially in case of the more promising method KLIEP.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_01450
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Improving importance estimation in covariate shift for providing accurate prediction error Fdez-Díaz, Laura Tomillo, Sara González Montañés, Elena Quevedo, José Ramón Machine Learning In traditional Machine Learning, the algorithms predictions are based on the assumption that the data follows the same distribution in both the training and the test datasets. However, in real world data this condition does not hold and, for instance, the distribution of the covariates changes whereas the conditional distribution of the targets remains unchanged. This situation is called covariate shift problem where standard error estimation may be no longer accurate. In this context, the importance is a measure commonly used to alleviate the influence of covariate shift on error estimations. The main drawback is that it is not easy to compute. The Kullback-Leibler Importance Estimation Procedure (KLIEP) is capable of estimating importance in a promising way. Despite its good performance, it fails to ignore target information, since it only includes the covariates information for computing the importance. In this direction, this paper explores the potential performance improvement if target information is considered in the computation of the importance. Then, a redefinition of the importance arises in order to be generalized in this way. Besides the potential improvement in performance, including target information make possible the application to a real application about plankton classification that motivates this research and characterized by its great dimensionality, since considering targets rather than covariates reduces the computation and the noise in the covariates. The impact of taking target information is also explored when Logistic Regression (LR), Kernel Mean Matching (KMM), Ensemble Kernel Mean Matching (EKMM) and the naive predecessor of KLIEP called Kernel Density Estimation (KDE) methods estimate the importance. The experimental results lead to a more accurate error estimation using target information, especially in case of the more promising method KLIEP.
title	Improving importance estimation in covariate shift for providing accurate prediction error
topic	Machine Learning
url	https://arxiv.org/abs/2402.01450

Similar Items