Guardado en:
Detalles Bibliográficos
Autores principales: Gonzalez-Alvarado, Daniel, Cassel, Jonas, Petra, Stefania, Schnörr, Christoph
Formato: Preprint
Publicado: 2026
Materias:
Acceso en línea:https://arxiv.org/abs/2601.21831
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866917465980665856
author Gonzalez-Alvarado, Daniel
Cassel, Jonas
Petra, Stefania
Schnörr, Christoph
author_facet Gonzalez-Alvarado, Daniel
Cassel, Jonas
Petra, Stefania
Schnörr, Christoph
contents We propose a geometric latent-subspace framework for generative modeling of discrete data. Specifically, we introduce latent subspaces in the exponential parameter space of product manifolds of categorical distributions as a novel method for learning generative models of discrete data. The resulting low-dimensional latent space encodes statistical dependencies and removes redundant degrees of freedom among the categorical variables. We equip the parameter domain with a Riemannian geometry such that the latent subspace and induced data manifold are related by isometries enabling consistent flow matching. Exploiting this structure, we propose a geometry-aware dimensionality reduction objective, called geometric PCA (GPCA), which we formulate as a regularized cross-entropy minimization that encourages small Riemannian distances between the data and their reconstructions. In particular, under the induced geometry, geodesics become straight lines in the latent parameter space which makes model training by flow matching effective. Empirical results show that low-dimensional latent representations suffice to accurately model high-dimensional discrete data.
format Preprint
id arxiv_https___arxiv_org_abs_2601_21831
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Generative Modeling of Discrete Data Using Geometric Latent Subspaces
Gonzalez-Alvarado, Daniel
Cassel, Jonas
Petra, Stefania
Schnörr, Christoph
Machine Learning
We propose a geometric latent-subspace framework for generative modeling of discrete data. Specifically, we introduce latent subspaces in the exponential parameter space of product manifolds of categorical distributions as a novel method for learning generative models of discrete data. The resulting low-dimensional latent space encodes statistical dependencies and removes redundant degrees of freedom among the categorical variables. We equip the parameter domain with a Riemannian geometry such that the latent subspace and induced data manifold are related by isometries enabling consistent flow matching. Exploiting this structure, we propose a geometry-aware dimensionality reduction objective, called geometric PCA (GPCA), which we formulate as a regularized cross-entropy minimization that encourages small Riemannian distances between the data and their reconstructions. In particular, under the induced geometry, geodesics become straight lines in the latent parameter space which makes model training by flow matching effective. Empirical results show that low-dimensional latent representations suffice to accurately model high-dimensional discrete data.
title Generative Modeling of Discrete Data Using Geometric Latent Subspaces
topic Machine Learning
url https://arxiv.org/abs/2601.21831