Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Shuangjie, Mallick, Bani K., Ni, Yang
Format:	Preprint
Published:	2026
Subjects:	Methodology
Online Access:	https://arxiv.org/abs/2605.03178
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911654349897728
author	Zhang, Shuangjie Mallick, Bani K. Ni, Yang
author_facet	Zhang, Shuangjie Mallick, Bani K. Ni, Yang
contents	Compositional data, which are vectors of proportions constrained to the probability simplex, arise frequently in modern scientific applications, including microbiome relative abundances across body sites and cell-type mixture weights derived from single-cell genomics. While regression methods for compositional data are well developed, no existing graphical model framework addresses the problem of learning conditional dependence structures among multiple compositional vectors. This paper introduces a novel framework for directed tree structure learning over compositional nodes. We employ the Kullback-Leibler divergence as the scoring function and model the conditional expectation of each child composition as a mixture of a baseline composition and a parent-driven component parameterized by a column-stochastic transition matrix. This formulation respects the simplex geometry, handles zero-inflated compositions gracefully, and, combined with a non-degeneracy condition on the transition matrix, ensures identifiability of edge directions from observational data. We prove consistency of structure recovery and derive finite-sample guarantees that characterize the required sample size in terms of the signal gap, node dimension, and penalty level. The efficacy of our approach is demonstrated through simulations and applications to multi-site microbiome data and single-cell data, yielding interpretable directed structures that align with known biological mechanisms.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_03178
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Structure Learning for Directed Trees with Zero-Inflated Compositional Nodes Zhang, Shuangjie Mallick, Bani K. Ni, Yang Methodology Compositional data, which are vectors of proportions constrained to the probability simplex, arise frequently in modern scientific applications, including microbiome relative abundances across body sites and cell-type mixture weights derived from single-cell genomics. While regression methods for compositional data are well developed, no existing graphical model framework addresses the problem of learning conditional dependence structures among multiple compositional vectors. This paper introduces a novel framework for directed tree structure learning over compositional nodes. We employ the Kullback-Leibler divergence as the scoring function and model the conditional expectation of each child composition as a mixture of a baseline composition and a parent-driven component parameterized by a column-stochastic transition matrix. This formulation respects the simplex geometry, handles zero-inflated compositions gracefully, and, combined with a non-degeneracy condition on the transition matrix, ensures identifiability of edge directions from observational data. We prove consistency of structure recovery and derive finite-sample guarantees that characterize the required sample size in terms of the signal gap, node dimension, and penalty level. The efficacy of our approach is demonstrated through simulations and applications to multi-site microbiome data and single-cell data, yielding interpretable directed structures that align with known biological mechanisms.
title	Structure Learning for Directed Trees with Zero-Inflated Compositional Nodes
topic	Methodology
url	https://arxiv.org/abs/2605.03178

Similar Items