Saved in:
Bibliographic Details
Main Authors: Azad, Fatemeh, Bosnić, Zoran, Kukar, Matjaž
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.03316
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909768891760640
author Azad, Fatemeh
Bosnić, Zoran
Kukar, Matjaž
author_facet Azad, Fatemeh
Bosnić, Zoran
Kukar, Matjaž
contents Missing data represents a fundamental challenge in machine learning applications, often reducing model performance and reliability. This problem is particularly acute in fields like bioinformatics and clinical machine learning, where datasets are frequently incomplete due to the nature of both data generation and data collection. While numerous imputation methods exist, from simple statistical techniques to advanced deep learning models, no single method consistently performs well across diverse datasets and missingness mechanisms. This paper proposes a novel Meta-Imputation approach that learns to combine the outputs of multiple base imputers to predict missing values more accurately. By training the proposed method called Meta-Imputation Balanced (MIB) on synthetically masked data with known ground truth, the system learns to predict the most suitable imputed value based on the behavior of each method. Our work highlights the potential of ensemble learning in imputation and paves the way for more robust, modular, and interpretable preprocessing pipelines in real-world machine learning systems.
format Preprint
id arxiv_https___arxiv_org_abs_2509_03316
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Meta-Imputation Balanced (MIB): An Ensemble Approach for Handling Missing Data in Biomedical Machine Learning
Azad, Fatemeh
Bosnić, Zoran
Kukar, Matjaž
Machine Learning
Missing data represents a fundamental challenge in machine learning applications, often reducing model performance and reliability. This problem is particularly acute in fields like bioinformatics and clinical machine learning, where datasets are frequently incomplete due to the nature of both data generation and data collection. While numerous imputation methods exist, from simple statistical techniques to advanced deep learning models, no single method consistently performs well across diverse datasets and missingness mechanisms. This paper proposes a novel Meta-Imputation approach that learns to combine the outputs of multiple base imputers to predict missing values more accurately. By training the proposed method called Meta-Imputation Balanced (MIB) on synthetically masked data with known ground truth, the system learns to predict the most suitable imputed value based on the behavior of each method. Our work highlights the potential of ensemble learning in imputation and paves the way for more robust, modular, and interpretable preprocessing pipelines in real-world machine learning systems.
title Meta-Imputation Balanced (MIB): An Ensemble Approach for Handling Missing Data in Biomedical Machine Learning
topic Machine Learning
url https://arxiv.org/abs/2509.03316