Saved in:
Bibliographic Details
Main Author: Qing, Huan
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.05535
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916352850132992
author Qing, Huan
author_facet Qing, Huan
contents Traditional categorical data, often collected in psychological tests and educational assessments, are typically single-layer and gathered only once.This paper considers a more general case, multi-layer categorical data with polytomous responses. To model such data, we present a novel statistical model, the multi-layer latent class model (multi-layer LCM). This model assumes that all layers share common subjects and items. To discover subjects' latent classes and other model parameters under this model, we develop three efficient spectral methods based on the sum of response matrices, the sum of Gram matrices, and the debiased sum of Gram matrices, respectively. Within the framework of multi-layer LCM, we demonstrate the estimation consistency of these methods under mild conditions regarding data sparsity. Our theoretical findings reveal two key insights: (1) increasing the number of layers can enhance the performance of the proposed methods, highlighting the advantages of considering multiple layers in latent class analysis; (2) we theoretically show that the algorithm based on the debiased sum of Gram matrices usually performs best. Additionally, we propose an approach that combines the averaged modularity metric with our methods to determine the number of latent classes. Extensive experiments are conducted to support our theoretical results and show the powerfulness of our methods in the task of learning latent classes and estimating the number of latent classes in multi-layer categorical data with polytomous responses.
format Preprint
id arxiv_https___arxiv_org_abs_2408_05535
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Latent class analysis for multi-layer categorical data
Qing, Huan
Machine Learning
Traditional categorical data, often collected in psychological tests and educational assessments, are typically single-layer and gathered only once.This paper considers a more general case, multi-layer categorical data with polytomous responses. To model such data, we present a novel statistical model, the multi-layer latent class model (multi-layer LCM). This model assumes that all layers share common subjects and items. To discover subjects' latent classes and other model parameters under this model, we develop three efficient spectral methods based on the sum of response matrices, the sum of Gram matrices, and the debiased sum of Gram matrices, respectively. Within the framework of multi-layer LCM, we demonstrate the estimation consistency of these methods under mild conditions regarding data sparsity. Our theoretical findings reveal two key insights: (1) increasing the number of layers can enhance the performance of the proposed methods, highlighting the advantages of considering multiple layers in latent class analysis; (2) we theoretically show that the algorithm based on the debiased sum of Gram matrices usually performs best. Additionally, we propose an approach that combines the averaged modularity metric with our methods to determine the number of latent classes. Extensive experiments are conducted to support our theoretical results and show the powerfulness of our methods in the task of learning latent classes and estimating the number of latent classes in multi-layer categorical data with polytomous responses.
title Latent class analysis for multi-layer categorical data
topic Machine Learning
url https://arxiv.org/abs/2408.05535