Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.22973 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866917522582798336 |
|---|---|
| author | Rajabinasab, Muhammad Houle, Michael E. Chelly, Oussama Zimek, Arthur |
| author_facet | Rajabinasab, Muhammad Houle, Michael E. Chelly, Oussama Zimek, Arthur |
| contents | Many novel unsupervised feature selection methods are proposed each year, yet their empirical evaluation is limited to supervised and unsupervised evaluation metrics computed on selected datasets, along with comparisons to existing methods. However, in the absence of an established evaluation baseline, it is difficult to determine the value added to the existing literature by each of these methods, and how effective their underlying approaches are. We propose using random feature selection as a baseline for evaluating the unsupervised feature selection methods. We empirically show that many of the state-of-the-art methods in unsupervised feature selection are outperformed by random feature selection in both performance and efficiency. Accordingly, we emphasize on the strict requirement of considering random feature selection as a baseline in the development process of novel unsupervised feature selection methods to ensure a consistent improvement over random feature selection. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2605_22973 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Worse than Random: The Importance of a Baseline for Unsupervised Feature Selection Rajabinasab, Muhammad Houle, Michael E. Chelly, Oussama Zimek, Arthur Machine Learning Artificial Intelligence Many novel unsupervised feature selection methods are proposed each year, yet their empirical evaluation is limited to supervised and unsupervised evaluation metrics computed on selected datasets, along with comparisons to existing methods. However, in the absence of an established evaluation baseline, it is difficult to determine the value added to the existing literature by each of these methods, and how effective their underlying approaches are. We propose using random feature selection as a baseline for evaluating the unsupervised feature selection methods. We empirically show that many of the state-of-the-art methods in unsupervised feature selection are outperformed by random feature selection in both performance and efficiency. Accordingly, we emphasize on the strict requirement of considering random feature selection as a baseline in the development process of novel unsupervised feature selection methods to ensure a consistent improvement over random feature selection. |
| title | Worse than Random: The Importance of a Baseline for Unsupervised Feature Selection |
| topic | Machine Learning Artificial Intelligence |
| url | https://arxiv.org/abs/2605.22973 |