Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Machado, Agathe Fernandes, Charpentier, Arthur, Flachaire, Emmanuel, Gallic, Ewen, Hu, François
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2408.03421
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911980426625024
author	Machado, Agathe Fernandes Charpentier, Arthur Flachaire, Emmanuel Gallic, Ewen Hu, François
author_facet	Machado, Agathe Fernandes Charpentier, Arthur Flachaire, Emmanuel Gallic, Ewen Hu, François
contents	In binary classification tasks, accurate representation of probabilistic predictions is essential for various real-world applications such as predicting payment defaults or assessing medical risks. The model must then be well-calibrated to ensure alignment between predicted probabilities and actual outcomes. However, when score heterogeneity deviates from the underlying data probability distribution, traditional calibration metrics lose reliability, failing to align score distribution with actual probabilities. In this study, we highlight approaches that prioritize optimizing the alignment between predicted scores and true probability distributions over minimizing traditional performance or calibration metrics. When employing tree-based models such as Random Forest and XGBoost, our analysis emphasizes the flexibility these models offer in tuning hyperparameters to minimize the Kullback-Leibler (KL) divergence between predicted and true distributions. Through extensive empirical analysis across 10 UCI datasets and simulations, we demonstrate that optimizing tree-based models based on KL divergence yields superior alignment between predicted scores and actual probabilities without significant performance loss. In real-world scenarios, the reference probability is determined a priori as a Beta distribution estimated through maximum likelihood. Conversely, minimizing traditional calibration metrics may lead to suboptimal results, characterized by notable performance declines and inferior KL values. Our findings reveal limitations in traditional calibration metrics, which could undermine the reliability of predictive models for critical decision-making.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_03421
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Probabilistic Scores of Classifiers, Calibration is not Enough Machado, Agathe Fernandes Charpentier, Arthur Flachaire, Emmanuel Gallic, Ewen Hu, François Machine Learning In binary classification tasks, accurate representation of probabilistic predictions is essential for various real-world applications such as predicting payment defaults or assessing medical risks. The model must then be well-calibrated to ensure alignment between predicted probabilities and actual outcomes. However, when score heterogeneity deviates from the underlying data probability distribution, traditional calibration metrics lose reliability, failing to align score distribution with actual probabilities. In this study, we highlight approaches that prioritize optimizing the alignment between predicted scores and true probability distributions over minimizing traditional performance or calibration metrics. When employing tree-based models such as Random Forest and XGBoost, our analysis emphasizes the flexibility these models offer in tuning hyperparameters to minimize the Kullback-Leibler (KL) divergence between predicted and true distributions. Through extensive empirical analysis across 10 UCI datasets and simulations, we demonstrate that optimizing tree-based models based on KL divergence yields superior alignment between predicted scores and actual probabilities without significant performance loss. In real-world scenarios, the reference probability is determined a priori as a Beta distribution estimated through maximum likelihood. Conversely, minimizing traditional calibration metrics may lead to suboptimal results, characterized by notable performance declines and inferior KL values. Our findings reveal limitations in traditional calibration metrics, which could undermine the reliability of predictive models for critical decision-making.
title	Probabilistic Scores of Classifiers, Calibration is not Enough
topic	Machine Learning
url	https://arxiv.org/abs/2408.03421

Similar Items