Saved in:
Bibliographic Details
Main Authors: AghahosseinaliShirazi, Zahra, Rangel, Pedro A., de Souza, Camila P. E.
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.00245
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915216989618176
author AghahosseinaliShirazi, Zahra
Rangel, Pedro A.
de Souza, Camila P. E.
author_facet AghahosseinaliShirazi, Zahra
Rangel, Pedro A.
de Souza, Camila P. E.
contents Zero-inflated count data arise in various fields, including health, biology, economics, and the social sciences. These data are often modelled using probabilistic distributions such as zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), or zero-inflated binomial (ZIB). To account for heterogeneity in the data, it is often useful to cluster observations into groups that may explain underlying differences in the data-generating process. This paper focuses on model-based clustering for zero-inflated counts when observations are structured in a matrix form rather than a vector. We propose a clustering framework based on mixtures of ZIP or ZINB distributions, with both the count and zero components depending on cluster assignments. Our approach incorporates covariates through a log-linear structure for the mean parameter and includes a size factor to adjust for differences in total sampling or exposure. Model parameters and cluster assignments are estimated via the Expectation-Maximization (EM) algorithm. We assess the performance of our proposed methodology through simulation studies evaluating clustering accuracy and estimator properties, followed by applications to publicly available datasets.
format Preprint
id arxiv_https___arxiv_org_abs_2406_00245
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Model-based Clustering of Multi-Dimensional Zero-Inflated Counts via the EM Algorithm
AghahosseinaliShirazi, Zahra
Rangel, Pedro A.
de Souza, Camila P. E.
Methodology
Applications
Zero-inflated count data arise in various fields, including health, biology, economics, and the social sciences. These data are often modelled using probabilistic distributions such as zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), or zero-inflated binomial (ZIB). To account for heterogeneity in the data, it is often useful to cluster observations into groups that may explain underlying differences in the data-generating process. This paper focuses on model-based clustering for zero-inflated counts when observations are structured in a matrix form rather than a vector. We propose a clustering framework based on mixtures of ZIP or ZINB distributions, with both the count and zero components depending on cluster assignments. Our approach incorporates covariates through a log-linear structure for the mean parameter and includes a size factor to adjust for differences in total sampling or exposure. Model parameters and cluster assignments are estimated via the Expectation-Maximization (EM) algorithm. We assess the performance of our proposed methodology through simulation studies evaluating clustering accuracy and estimator properties, followed by applications to publicly available datasets.
title Model-based Clustering of Multi-Dimensional Zero-Inflated Counts via the EM Algorithm
topic Methodology
Applications
url https://arxiv.org/abs/2406.00245