Saved in:
Bibliographic Details
Main Authors: Phan, Bich-Chung, Ma, Thanh, Nguyen, Huu-Hoa, Do, Thanh-Nghi
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.00907
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912621044695040
author Phan, Bich-Chung
Ma, Thanh
Nguyen, Huu-Hoa
Do, Thanh-Nghi
author_facet Phan, Bich-Chung
Ma, Thanh
Nguyen, Huu-Hoa
Do, Thanh-Nghi
contents Feature selection is a crucial step in analyzing gene expression data, enhancing classification performance, and reducing computational costs for high-dimensional datasets. This paper proposes BoMGene, a hybrid feature selection method that effectively integrates two popular techniques: Boruta and Minimum Redundancy Maximum Relevance (mRMR). The method aims to optimize the feature space and enhance classification accuracy. Experiments were conducted on 25 publicly available gene expression datasets, employing widely used classifiers such as Support Vector Machine (SVM), Random Forest, XGBoost (XGB), and Gradient Boosting Machine (GBM). The results show that using the Boruta-mRMR combination cuts down the number of features chosen compared to just using mRMR, which helps to speed up training time while keeping or even improving classification accuracy compared to using individual feature selection methods. The proposed approach demonstrates clear advantages in accuracy, stability, and practical applicability for multi-class gene expression data analysis
format Preprint
id arxiv_https___arxiv_org_abs_2510_00907
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle BoMGene: Integrating Boruta-mRMR feature selection for enhanced Gene expression classification
Phan, Bich-Chung
Ma, Thanh
Nguyen, Huu-Hoa
Do, Thanh-Nghi
Machine Learning
Feature selection is a crucial step in analyzing gene expression data, enhancing classification performance, and reducing computational costs for high-dimensional datasets. This paper proposes BoMGene, a hybrid feature selection method that effectively integrates two popular techniques: Boruta and Minimum Redundancy Maximum Relevance (mRMR). The method aims to optimize the feature space and enhance classification accuracy. Experiments were conducted on 25 publicly available gene expression datasets, employing widely used classifiers such as Support Vector Machine (SVM), Random Forest, XGBoost (XGB), and Gradient Boosting Machine (GBM). The results show that using the Boruta-mRMR combination cuts down the number of features chosen compared to just using mRMR, which helps to speed up training time while keeping or even improving classification accuracy compared to using individual feature selection methods. The proposed approach demonstrates clear advantages in accuracy, stability, and practical applicability for multi-class gene expression data analysis
title BoMGene: Integrating Boruta-mRMR feature selection for enhanced Gene expression classification
topic Machine Learning
url https://arxiv.org/abs/2510.00907