Saved in:
Bibliographic Details
Main Author: Sewell, Roger
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2404.15764
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912318633279488
author Sewell, Roger
author_facet Sewell, Roger
contents Shannon defined the mutual information between two variables. We illustrate why the true mutual information between a variable and the predictions made by a prediction algorithm is not a suitable measure of prediction quality, but the apparent Shannon mutual information (ASI) is; indeed it is the unique prediction quality measure with either of two very different lists of desirable properties, as previously shown by de Finetti and other authors. However, estimating the uncertainty of the ASI is a difficult problem, because of long and non-symmetric heavy tails to the distribution of the individual values of $j(x,y)=\log\frac{Q_y(x)}{P(x)}$ We propose a Bayesian modelling method for the distribution of $j(x,y)$, from the posterior distribution of which the uncertainty in the ASI can be inferred. This method is based on Dirichlet-based mixtures of skew-Student distributions. We illustrate its use on data from a Bayesian model for prediction of the recurrence time of prostate cancer. We believe that this approach is generally appropriate for most problems, where it is infeasible to derive the explicit distribution of the samples of $j(x,y)$, though the precise modelling parameters may need adjustment to suit particular cases.
format Preprint
id arxiv_https___arxiv_org_abs_2404_15764
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Assessment of the quality of a prediction
Sewell, Roger
Statistics Theory
Methodology
62B10 (Primary) 62F15, 62J20, 62P10 (Secondary)
Shannon defined the mutual information between two variables. We illustrate why the true mutual information between a variable and the predictions made by a prediction algorithm is not a suitable measure of prediction quality, but the apparent Shannon mutual information (ASI) is; indeed it is the unique prediction quality measure with either of two very different lists of desirable properties, as previously shown by de Finetti and other authors. However, estimating the uncertainty of the ASI is a difficult problem, because of long and non-symmetric heavy tails to the distribution of the individual values of $j(x,y)=\log\frac{Q_y(x)}{P(x)}$ We propose a Bayesian modelling method for the distribution of $j(x,y)$, from the posterior distribution of which the uncertainty in the ASI can be inferred. This method is based on Dirichlet-based mixtures of skew-Student distributions. We illustrate its use on data from a Bayesian model for prediction of the recurrence time of prostate cancer. We believe that this approach is generally appropriate for most problems, where it is infeasible to derive the explicit distribution of the samples of $j(x,y)$, though the precise modelling parameters may need adjustment to suit particular cases.
title Assessment of the quality of a prediction
topic Statistics Theory
Methodology
62B10 (Primary) 62F15, 62J20, 62P10 (Secondary)
url https://arxiv.org/abs/2404.15764