Saved in:
Bibliographic Details
Main Authors: Rajabinasab, Muhammad, Houle, Michael E., Chelly, Oussama, Zimek, Arthur
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.22973
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917522582798336
author Rajabinasab, Muhammad
Houle, Michael E.
Chelly, Oussama
Zimek, Arthur
author_facet Rajabinasab, Muhammad
Houle, Michael E.
Chelly, Oussama
Zimek, Arthur
contents Many novel unsupervised feature selection methods are proposed each year, yet their empirical evaluation is limited to supervised and unsupervised evaluation metrics computed on selected datasets, along with comparisons to existing methods. However, in the absence of an established evaluation baseline, it is difficult to determine the value added to the existing literature by each of these methods, and how effective their underlying approaches are. We propose using random feature selection as a baseline for evaluating the unsupervised feature selection methods. We empirically show that many of the state-of-the-art methods in unsupervised feature selection are outperformed by random feature selection in both performance and efficiency. Accordingly, we emphasize on the strict requirement of considering random feature selection as a baseline in the development process of novel unsupervised feature selection methods to ensure a consistent improvement over random feature selection.
format Preprint
id arxiv_https___arxiv_org_abs_2605_22973
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Worse than Random: The Importance of a Baseline for Unsupervised Feature Selection
Rajabinasab, Muhammad
Houle, Michael E.
Chelly, Oussama
Zimek, Arthur
Machine Learning
Artificial Intelligence
Many novel unsupervised feature selection methods are proposed each year, yet their empirical evaluation is limited to supervised and unsupervised evaluation metrics computed on selected datasets, along with comparisons to existing methods. However, in the absence of an established evaluation baseline, it is difficult to determine the value added to the existing literature by each of these methods, and how effective their underlying approaches are. We propose using random feature selection as a baseline for evaluating the unsupervised feature selection methods. We empirically show that many of the state-of-the-art methods in unsupervised feature selection are outperformed by random feature selection in both performance and efficiency. Accordingly, we emphasize on the strict requirement of considering random feature selection as a baseline in the development process of novel unsupervised feature selection methods to ensure a consistent improvement over random feature selection.
title Worse than Random: The Importance of a Baseline for Unsupervised Feature Selection
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2605.22973