Saved in:
Bibliographic Details
Main Authors: Putrama, I Made, Martinek, Peter
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2501.04099
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915265114013696
author Putrama, I Made
Martinek, Peter
author_facet Putrama, I Made
Martinek, Peter
contents Imbalanced multiclass datasets pose challenges for machine learning algorithms. These datasets often contain minority classes that are important for accurate prediction. Existing methods still suffer from sparse data and may not accurately represent the original data patterns, leading to noise and poor model performance. A hybrid method called Neighbor Displacement-based Enhanced Synthetic Oversampling (NDESO) is proposed in this paper. This approach uses a displacement strategy for noisy data points, computing the average distance to their neighbors and moving them closer to their centroids. Random oversampling is then performed to achieve dataset balance. Extensive evaluations compare 14 alternatives on nine classifiers across synthetic and 20 real-world datasets with varying imbalance ratios. The results show that our method outperforms its competitors regarding average G-mean score and achieves the lowest statistical mean rank. This highlights its superiority and suitability for addressing data imbalance in practical applications.
format Preprint
id arxiv_https___arxiv_org_abs_2501_04099
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Neighbor displacement-based enhanced synthetic oversampling for multiclass imbalanced data
Putrama, I Made
Martinek, Peter
Machine Learning
Imbalanced multiclass datasets pose challenges for machine learning algorithms. These datasets often contain minority classes that are important for accurate prediction. Existing methods still suffer from sparse data and may not accurately represent the original data patterns, leading to noise and poor model performance. A hybrid method called Neighbor Displacement-based Enhanced Synthetic Oversampling (NDESO) is proposed in this paper. This approach uses a displacement strategy for noisy data points, computing the average distance to their neighbors and moving them closer to their centroids. Random oversampling is then performed to achieve dataset balance. Extensive evaluations compare 14 alternatives on nine classifiers across synthetic and 20 real-world datasets with varying imbalance ratios. The results show that our method outperforms its competitors regarding average G-mean score and achieves the lowest statistical mean rank. This highlights its superiority and suitability for addressing data imbalance in practical applications.
title Neighbor displacement-based enhanced synthetic oversampling for multiclass imbalanced data
topic Machine Learning
url https://arxiv.org/abs/2501.04099