Saved in:
Bibliographic Details
Main Authors: Liu, Wei, Zhong, Qingzhi
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.09889
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912481001078784
author Liu, Wei
Zhong, Qingzhi
author_facet Liu, Wei
Zhong, Qingzhi
contents Latent factor models that integrate data from multiple sources/studies or modalities have garnered considerable attention across various disciplines. However, existing methods predominantly focus either on multi-study integration or multi-modality integration, rendering them insufficient for analyzing the diverse modalities measured across multiple studies. To address this limitation and cater to practical needs, we introduce a high-dimensional generalized factor model that seamlessly integrates multi-modality data from multiple studies, while also accommodating additional covariates. We conduct a thorough investigation of the identifiability conditions to enhance the model's interpretability. To tackle the complexity of high-dimensional nonlinear integration caused by four large latent random matrices, we utilize a variational lower bound to approximate the observed log-likelihood by employing a variational posterior distribution. By profiling the variational parameters, we establish the asymptotical properties of estimators for model parameters using M-estimation theory. Furthermore, we devise a computationally efficient variational EM algorithm to execute the estimation process and a criterion to determine the optimal number of both study-shared and study-specific factors. Extensive simulation studies and a real-world application show that the proposed method significantly outperforms existing methods in terms of estimation accuracy and computational efficiency. The R package for the proposed method is publicly accessible at https://CRAN.R-project.org/package=MMGFM.
format Preprint
id arxiv_https___arxiv_org_abs_2507_09889
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle High-Dimensional Multi-Study Multi-Modality Covariate-Augmented Generalized Factor Model
Liu, Wei
Zhong, Qingzhi
Methodology
Latent factor models that integrate data from multiple sources/studies or modalities have garnered considerable attention across various disciplines. However, existing methods predominantly focus either on multi-study integration or multi-modality integration, rendering them insufficient for analyzing the diverse modalities measured across multiple studies. To address this limitation and cater to practical needs, we introduce a high-dimensional generalized factor model that seamlessly integrates multi-modality data from multiple studies, while also accommodating additional covariates. We conduct a thorough investigation of the identifiability conditions to enhance the model's interpretability. To tackle the complexity of high-dimensional nonlinear integration caused by four large latent random matrices, we utilize a variational lower bound to approximate the observed log-likelihood by employing a variational posterior distribution. By profiling the variational parameters, we establish the asymptotical properties of estimators for model parameters using M-estimation theory. Furthermore, we devise a computationally efficient variational EM algorithm to execute the estimation process and a criterion to determine the optimal number of both study-shared and study-specific factors. Extensive simulation studies and a real-world application show that the proposed method significantly outperforms existing methods in terms of estimation accuracy and computational efficiency. The R package for the proposed method is publicly accessible at https://CRAN.R-project.org/package=MMGFM.
title High-Dimensional Multi-Study Multi-Modality Covariate-Augmented Generalized Factor Model
topic Methodology
url https://arxiv.org/abs/2507.09889