Saved in:
Bibliographic Details
Main Authors: Abbas, Abdullah, Heep, Michael, Sperle, Theodor
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.26448
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916980362051584
author Abbas, Abdullah
Heep, Michael
Sperle, Theodor
author_facet Abbas, Abdullah
Heep, Michael
Sperle, Theodor
contents The selection of datasets in recommender systems research lacks a systematic methodology. Researchers often select datasets based on popularity rather than empirical suitability. We developed the APS Explorer, a web application that implements the Algorithm Performance Space (APS) framework for informed dataset selection. The system analyzes 96 datasets using 28 algorithms across three metrics (nDCG, Hit Ratio, Recall) at five K-values. We extend the APS framework with a statistical based classification system that categorizes datasets into five difficulty levels based on quintiles. We also introduce a variance-normalized distance metric based on Mahalanobis distance to measure similarity. The APS Explorer was successfully developed with three interactive modules for visualizing algorithm performance, direct comparing algorithms, and analyzing dataset metadata. This tool shifts the process of selecting datasets from intuition-based to evidence-based practices, and it is publicly available at datasets.recommender-systems.com.
format Preprint
id arxiv_https___arxiv_org_abs_2509_26448
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Informed Dataset Selection
Abbas, Abdullah
Heep, Michael
Sperle, Theodor
Information Retrieval
The selection of datasets in recommender systems research lacks a systematic methodology. Researchers often select datasets based on popularity rather than empirical suitability. We developed the APS Explorer, a web application that implements the Algorithm Performance Space (APS) framework for informed dataset selection. The system analyzes 96 datasets using 28 algorithms across three metrics (nDCG, Hit Ratio, Recall) at five K-values. We extend the APS framework with a statistical based classification system that categorizes datasets into five difficulty levels based on quintiles. We also introduce a variance-normalized distance metric based on Mahalanobis distance to measure similarity. The APS Explorer was successfully developed with three interactive modules for visualizing algorithm performance, direct comparing algorithms, and analyzing dataset metadata. This tool shifts the process of selecting datasets from intuition-based to evidence-based practices, and it is publicly available at datasets.recommender-systems.com.
title Informed Dataset Selection
topic Information Retrieval
url https://arxiv.org/abs/2509.26448