Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Franco, Mario, Febres, Gerardo, Fernández, Nelson, Gershenson, Carlos
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Information Theory
Online Access:	https://arxiv.org/abs/2502.08041
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917155476340736
author	Franco, Mario Febres, Gerardo Fernández, Nelson Gershenson, Carlos
author_facet	Franco, Mario Febres, Gerardo Fernández, Nelson Gershenson, Carlos
contents	Classification is a ubiquitous and fundamental problem in artificial intelligence and machine learning, with extensive efforts dedicated to developing more powerful classifiers and larger datasets. However, the classification task is ultimately constrained by the intrinsic properties of datasets, independently of computational power or model complexity. In this work, we introduce a formal entropy-based measure of classificability, which quantifies the inherent difficulty of a classification problem by assessing the uncertainty in class assignments given feature representations. This measure captures the degree of class overlap and aligns with human intuition, serving as an upper bound on classification performance for classification problems. Our results establish a theoretical limit beyond which no classifier can improve the classification accuracy, regardless of the architecture or amount of data, in a given problem. Our approach provides a principled framework for understanding when classification is inherently fallible and fundamentally ambiguous.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_08041
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	The Art of Misclassification: Too Many Classes, Not Enough Points Franco, Mario Febres, Gerardo Fernández, Nelson Gershenson, Carlos Machine Learning Information Theory Classification is a ubiquitous and fundamental problem in artificial intelligence and machine learning, with extensive efforts dedicated to developing more powerful classifiers and larger datasets. However, the classification task is ultimately constrained by the intrinsic properties of datasets, independently of computational power or model complexity. In this work, we introduce a formal entropy-based measure of classificability, which quantifies the inherent difficulty of a classification problem by assessing the uncertainty in class assignments given feature representations. This measure captures the degree of class overlap and aligns with human intuition, serving as an upper bound on classification performance for classification problems. Our results establish a theoretical limit beyond which no classifier can improve the classification accuracy, regardless of the architecture or amount of data, in a given problem. Our approach provides a principled framework for understanding when classification is inherently fallible and fundamentally ambiguous.
title	The Art of Misclassification: Too Many Classes, Not Enough Points
topic	Machine Learning Information Theory
url	https://arxiv.org/abs/2502.08041

Similar Items