Saved in:
Bibliographic Details
Main Authors: Franco, Mario, Febres, Gerardo, Fernández, Nelson, Gershenson, Carlos
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.08041
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917155476340736
author Franco, Mario
Febres, Gerardo
Fernández, Nelson
Gershenson, Carlos
author_facet Franco, Mario
Febres, Gerardo
Fernández, Nelson
Gershenson, Carlos
contents Classification is a ubiquitous and fundamental problem in artificial intelligence and machine learning, with extensive efforts dedicated to developing more powerful classifiers and larger datasets. However, the classification task is ultimately constrained by the intrinsic properties of datasets, independently of computational power or model complexity. In this work, we introduce a formal entropy-based measure of classificability, which quantifies the inherent difficulty of a classification problem by assessing the uncertainty in class assignments given feature representations. This measure captures the degree of class overlap and aligns with human intuition, serving as an upper bound on classification performance for classification problems. Our results establish a theoretical limit beyond which no classifier can improve the classification accuracy, regardless of the architecture or amount of data, in a given problem. Our approach provides a principled framework for understanding when classification is inherently fallible and fundamentally ambiguous.
format Preprint
id arxiv_https___arxiv_org_abs_2502_08041
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle The Art of Misclassification: Too Many Classes, Not Enough Points
Franco, Mario
Febres, Gerardo
Fernández, Nelson
Gershenson, Carlos
Machine Learning
Information Theory
Classification is a ubiquitous and fundamental problem in artificial intelligence and machine learning, with extensive efforts dedicated to developing more powerful classifiers and larger datasets. However, the classification task is ultimately constrained by the intrinsic properties of datasets, independently of computational power or model complexity. In this work, we introduce a formal entropy-based measure of classificability, which quantifies the inherent difficulty of a classification problem by assessing the uncertainty in class assignments given feature representations. This measure captures the degree of class overlap and aligns with human intuition, serving as an upper bound on classification performance for classification problems. Our results establish a theoretical limit beyond which no classifier can improve the classification accuracy, regardless of the architecture or amount of data, in a given problem. Our approach provides a principled framework for understanding when classification is inherently fallible and fundamentally ambiguous.
title The Art of Misclassification: Too Many Classes, Not Enough Points
topic Machine Learning
Information Theory
url https://arxiv.org/abs/2502.08041