Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Phan, Bich-Chung, Ma, Thanh, Nguyen, Huu-Hoa, Do, Thanh-Nghi
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.13080
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912246816309248
author	Phan, Bich-Chung Ma, Thanh Nguyen, Huu-Hoa Do, Thanh-Nghi
author_facet	Phan, Bich-Chung Ma, Thanh Nguyen, Huu-Hoa Do, Thanh-Nghi
contents	Gene expression classification is a pivotal yet challenging task in bioinformatics, primarily due to the high dimensionality of genomic data and the risk of overfitting. To bridge this gap, we propose BOLIMES, a novel feature selection algorithm designed to enhance gene expression classification by systematically refining the feature subset. Unlike conventional methods that rely solely on statistical ranking or classifier-specific selection, we integrate the robustness of Boruta with the interpretability of LIME, ensuring that only the most relevant and influential genes are retained. BOLIMES first employs Boruta to filter out non-informative genes by comparing each feature against its randomized counterpart, thus preserving valuable information. It then uses LIME to rank the remaining genes based on their local importance to the classifier. Finally, an iterative classification evaluation determines the optimal feature subset by selecting the number of genes that maximizes predictive accuracy. By combining exhaustive feature selection with interpretability-driven refinement, our solution effectively balances dimensionality reduction with high classification performance, offering a powerful solution for high-dimensional gene expression analysis.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_13080
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	BOLIMES: Boruta and LIME optiMized fEature Selection for Gene Expression Classification Phan, Bich-Chung Ma, Thanh Nguyen, Huu-Hoa Do, Thanh-Nghi Machine Learning Artificial Intelligence Gene expression classification is a pivotal yet challenging task in bioinformatics, primarily due to the high dimensionality of genomic data and the risk of overfitting. To bridge this gap, we propose BOLIMES, a novel feature selection algorithm designed to enhance gene expression classification by systematically refining the feature subset. Unlike conventional methods that rely solely on statistical ranking or classifier-specific selection, we integrate the robustness of Boruta with the interpretability of LIME, ensuring that only the most relevant and influential genes are retained. BOLIMES first employs Boruta to filter out non-informative genes by comparing each feature against its randomized counterpart, thus preserving valuable information. It then uses LIME to rank the remaining genes based on their local importance to the classifier. Finally, an iterative classification evaluation determines the optimal feature subset by selecting the number of genes that maximizes predictive accuracy. By combining exhaustive feature selection with interpretability-driven refinement, our solution effectively balances dimensionality reduction with high classification performance, offering a powerful solution for high-dimensional gene expression analysis.
title	BOLIMES: Boruta and LIME optiMized fEature Selection for Gene Expression Classification
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2502.13080

Similar Items