Saved in:
Bibliographic Details
Main Authors: Adhikari, Aayush, Bhatta, Sandesh, Jangwan, Harendra S., Mishra, Amit, Nisa, Khair Ul, Zamani, Abu Taha, Sapkota, Aaron, Muduli, Debendra, Parveen, Nikhat
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.04251
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918085035819008
author Adhikari, Aayush
Bhatta, Sandesh
Jangwan, Harendra S.
Mishra, Amit
Nisa, Khair Ul
Zamani, Abu Taha
Sapkota, Aaron
Muduli, Debendra
Parveen, Nikhat
author_facet Adhikari, Aayush
Bhatta, Sandesh
Jangwan, Harendra S.
Mishra, Amit
Nisa, Khair Ul
Zamani, Abu Taha
Sapkota, Aaron
Muduli, Debendra
Parveen, Nikhat
contents High dimensionality in datasets produced by microarray technology presents a challenge for Machine Learning (ML) algorithms, particularly in terms of dimensionality reduction and handling imbalanced sample sizes. To mitigate the explained problems, we have proposedhybrid ensemble feature selection techniques with majority voting classifier for micro array classi f ication. Here we have considered both filter and wrapper-based feature selection techniques including Mutual Information (MI), Chi-Square, Variance Threshold (VT), Least Absolute Shrinkage and Selection Operator (LASSO), Analysis of Variance (ANOVA), and Recursive Feature Elimination (RFE), followed by Particle Swarm Optimization (PSO) for selecting the optimal features. This Artificial Intelligence (AI) approach leverages a Majority Voting Classifier that combines multiple machine learning models, such as Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost), to enhance overall performance and accuracy. By leveraging the strengths of each model, the ensemble approach aims to provide more reliable and effective diagnostic predictions. The efficacy of the proposed model has been tested in both local and cloud environments. In the cloud environment, three virtual machines virtual Central Processing Unit (vCPU) with size 8,16 and 64 bits, have been used to demonstrate the model performance. From the experiment it has been observed that, virtual Central Processing Unit (vCPU)-64 bits provides better classification accuracies of 95.89%, 97.50%, 99.13%, 99.58%, 99.11%, and 94.60% with six microarray datasets, Mixed Lineage Leukemia (MLL), Leukemia, Small Round Blue Cell Tumors (SRBCT), Lymphoma, Ovarian, andLung,respectively, validating the effectiveness of the proposed modelin bothlocalandcloud environments.
format Preprint
id arxiv_https___arxiv_org_abs_2507_04251
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle ATwo-Stage Ensemble Feature Selection and Particle Swarm Optimization Approach for Micro-Array Data Classification in Distributed Computing Environments
Adhikari, Aayush
Bhatta, Sandesh
Jangwan, Harendra S.
Mishra, Amit
Nisa, Khair Ul
Zamani, Abu Taha
Sapkota, Aaron
Muduli, Debendra
Parveen, Nikhat
Machine Learning
High dimensionality in datasets produced by microarray technology presents a challenge for Machine Learning (ML) algorithms, particularly in terms of dimensionality reduction and handling imbalanced sample sizes. To mitigate the explained problems, we have proposedhybrid ensemble feature selection techniques with majority voting classifier for micro array classi f ication. Here we have considered both filter and wrapper-based feature selection techniques including Mutual Information (MI), Chi-Square, Variance Threshold (VT), Least Absolute Shrinkage and Selection Operator (LASSO), Analysis of Variance (ANOVA), and Recursive Feature Elimination (RFE), followed by Particle Swarm Optimization (PSO) for selecting the optimal features. This Artificial Intelligence (AI) approach leverages a Majority Voting Classifier that combines multiple machine learning models, such as Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost), to enhance overall performance and accuracy. By leveraging the strengths of each model, the ensemble approach aims to provide more reliable and effective diagnostic predictions. The efficacy of the proposed model has been tested in both local and cloud environments. In the cloud environment, three virtual machines virtual Central Processing Unit (vCPU) with size 8,16 and 64 bits, have been used to demonstrate the model performance. From the experiment it has been observed that, virtual Central Processing Unit (vCPU)-64 bits provides better classification accuracies of 95.89%, 97.50%, 99.13%, 99.58%, 99.11%, and 94.60% with six microarray datasets, Mixed Lineage Leukemia (MLL), Leukemia, Small Round Blue Cell Tumors (SRBCT), Lymphoma, Ovarian, andLung,respectively, validating the effectiveness of the proposed modelin bothlocalandcloud environments.
title ATwo-Stage Ensemble Feature Selection and Particle Swarm Optimization Approach for Micro-Array Data Classification in Distributed Computing Environments
topic Machine Learning
url https://arxiv.org/abs/2507.04251