Saved in:
Bibliographic Details
Main Authors: Pohle, Marc-Oliver, Dimitriadis, Timo, Wermuth, Jan-Lukas
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2403.17580
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909898700226560
author Pohle, Marc-Oliver
Dimitriadis, Timo
Wermuth, Jan-Lukas
author_facet Pohle, Marc-Oliver
Dimitriadis, Timo
Wermuth, Jan-Lukas
contents Measuring dependence between two events, or equivalently between two binary random variables, amounts to expressing the dependence structure inherent in a $2\times 2$ contingency table in a real number between $-1$ and $1$. Countless such dependence measures exist, but there is little theoretical guidance on how they compare and on their advantages and shortcomings. Thus, practitioners might be overwhelmed by the problem of choosing a suitable measure. We provide a set of natural desirable properties that a proper dependence measure should fulfill. We show that Yule's Q and the little-known Cole coefficient are proper, while the most widely-used measures, the phi coefficient and all contingency coefficients, are improper. They have a severe attainability problem, that is, even under perfect dependence they can be very far away from $-1$ and $1$, and often differ substantially from the proper measures in that they understate strength of dependence. The structural reason is that these are measures for equality of events rather than of dependence. We derive the (in some instances non-standard) limiting distributions of the measures and illustrate how asymptotically valid confidence intervals can be constructed. In a case study on drug consumption we demonstrate how misleading conclusions may arise from the use of improper dependence measures.
format Preprint
id arxiv_https___arxiv_org_abs_2403_17580
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Measuring Dependence between Events
Pohle, Marc-Oliver
Dimitriadis, Timo
Wermuth, Jan-Lukas
Methodology
Measuring dependence between two events, or equivalently between two binary random variables, amounts to expressing the dependence structure inherent in a $2\times 2$ contingency table in a real number between $-1$ and $1$. Countless such dependence measures exist, but there is little theoretical guidance on how they compare and on their advantages and shortcomings. Thus, practitioners might be overwhelmed by the problem of choosing a suitable measure. We provide a set of natural desirable properties that a proper dependence measure should fulfill. We show that Yule's Q and the little-known Cole coefficient are proper, while the most widely-used measures, the phi coefficient and all contingency coefficients, are improper. They have a severe attainability problem, that is, even under perfect dependence they can be very far away from $-1$ and $1$, and often differ substantially from the proper measures in that they understate strength of dependence. The structural reason is that these are measures for equality of events rather than of dependence. We derive the (in some instances non-standard) limiting distributions of the measures and illustrate how asymptotically valid confidence intervals can be constructed. In a case study on drug consumption we demonstrate how misleading conclusions may arise from the use of improper dependence measures.
title Measuring Dependence between Events
topic Methodology
url https://arxiv.org/abs/2403.17580