Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Shi, Songxuan
Format: Preprint
Veröffentlicht: 2025
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2507.17255
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866909747498713088
author Shi, Songxuan
author_facet Shi, Songxuan
contents Variational Autoencoder is typically understood from the perspective of probabilistic inference. In this work, we propose a new geometric reinterpretation which complements the probabilistic view and enhances its intuitiveness. We demonstrate that the proper construction of semantic manifolds arises primarily from the constraining effect of the KL divergence on the encoder. We view the latent representations as a Gaussian ball rather than deterministic points. Under the constraint of KL divergence, Gaussian ball regularizes the latent space, promoting a more uniform distribution of encodings. Furthermore, we show that reparameterization establishes a critical contractual mechanism between the encoder and decoder, enabling the decoder to learn how to reconstruct from these stochastic regions. We further connect this viewpoint with VQ-VAE, offering a unified perspective: VQ-VAE can be seen as an autoencoder where encodings are constrained to a set of cluster centers, with its generative capability arising from the compactness rather than its stochasticity. This geometric framework provides a new lens for understanding how VAE shapes the latent geometry to enable effective generation.
format Preprint
id arxiv_https___arxiv_org_abs_2507_17255
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle From Points to Spheres: A Geometric Reinterpretation of Variational Autoencoders
Shi, Songxuan
Machine Learning
Variational Autoencoder is typically understood from the perspective of probabilistic inference. In this work, we propose a new geometric reinterpretation which complements the probabilistic view and enhances its intuitiveness. We demonstrate that the proper construction of semantic manifolds arises primarily from the constraining effect of the KL divergence on the encoder. We view the latent representations as a Gaussian ball rather than deterministic points. Under the constraint of KL divergence, Gaussian ball regularizes the latent space, promoting a more uniform distribution of encodings. Furthermore, we show that reparameterization establishes a critical contractual mechanism between the encoder and decoder, enabling the decoder to learn how to reconstruct from these stochastic regions. We further connect this viewpoint with VQ-VAE, offering a unified perspective: VQ-VAE can be seen as an autoencoder where encodings are constrained to a set of cluster centers, with its generative capability arising from the compactness rather than its stochasticity. This geometric framework provides a new lens for understanding how VAE shapes the latent geometry to enable effective generation.
title From Points to Spheres: A Geometric Reinterpretation of Variational Autoencoders
topic Machine Learning
url https://arxiv.org/abs/2507.17255