Saved in:
Bibliographic Details
Main Authors: Causin, Paola, Marta, Alessio
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.07291
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918087873265664
author Causin, Paola
Marta, Alessio
author_facet Causin, Paola
Marta, Alessio
contents High-dimensional datasets often exhibit low-dimensional geometric structures, as suggested by the manifold hypothesis, which implies that data lie on a smooth manifold embedded in a higher-dimensional ambient space. While this insight underpins many advances in machine learning and inverse problems, fully leveraging it requires to deal with three key tasks: estimating the intrinsic dimension (ID) of the manifold, constructing appropriate local coordinates, and learning mappings between ambient and manifold spaces. In this work, we propose a framework that addresses all these challenges using a Mixture of Variational Autoencoders (VAEs) and tools from Riemannian geometry. We specifically focus on estimating the ID of datasets by analyzing the numerical rank of the VAE decoder pullback metric. The estimated ID guides the construction of an atlas of local charts using a mixture of invertible VAEs, enabling accurate manifold parameterization and efficient inference. We how this approach enhances solutions to ill-posed inverse problems, particularly in biomedical imaging, by enforcing that reconstructions lie on the learned manifold. Lastly, we explore the impact of network pruning on manifold geometry and reconstruction quality, showing that the intrinsic dimension serves as an effective proxy for monitoring model capacity.
format Preprint
id arxiv_https___arxiv_org_abs_2507_07291
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Estimating Dataset Dimension via Singular Metrics under the Manifold Hypothesis: Application to Inverse Problems
Causin, Paola
Marta, Alessio
Machine Learning
High-dimensional datasets often exhibit low-dimensional geometric structures, as suggested by the manifold hypothesis, which implies that data lie on a smooth manifold embedded in a higher-dimensional ambient space. While this insight underpins many advances in machine learning and inverse problems, fully leveraging it requires to deal with three key tasks: estimating the intrinsic dimension (ID) of the manifold, constructing appropriate local coordinates, and learning mappings between ambient and manifold spaces. In this work, we propose a framework that addresses all these challenges using a Mixture of Variational Autoencoders (VAEs) and tools from Riemannian geometry. We specifically focus on estimating the ID of datasets by analyzing the numerical rank of the VAE decoder pullback metric. The estimated ID guides the construction of an atlas of local charts using a mixture of invertible VAEs, enabling accurate manifold parameterization and efficient inference. We how this approach enhances solutions to ill-posed inverse problems, particularly in biomedical imaging, by enforcing that reconstructions lie on the learned manifold. Lastly, we explore the impact of network pruning on manifold geometry and reconstruction quality, showing that the intrinsic dimension serves as an effective proxy for monitoring model capacity.
title Estimating Dataset Dimension via Singular Metrics under the Manifold Hypothesis: Application to Inverse Problems
topic Machine Learning
url https://arxiv.org/abs/2507.07291