Saved in:
Bibliographic Details
Main Authors: Bénézet, Cyril, Cheng, Ziteng, Jaimungal, Sebastian
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.09375
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910485663711232
author Bénézet, Cyril
Cheng, Ziteng
Jaimungal, Sebastian
author_facet Bénézet, Cyril
Cheng, Ziteng
Jaimungal, Sebastian
contents We investigate sample-based learning of conditional distributions on multi-dimensional unit boxes, allowing for different dimensions of the feature and target spaces. Our approach involves clustering data near varying query points in the feature space to create empirical measures in the target space. We employ two distinct clustering schemes: one based on a fixed-radius ball and the other on nearest neighbors. We establish upper bounds for the convergence rates of both methods and, from these bounds, deduce optimal configurations for the radius and the number of neighbors. We propose to incorporate the nearest neighbors method into neural network training, as our empirical analysis indicates it has better performance in practice. For efficiency, our training process utilizes approximate nearest neighbors search with random binary space partitioning. Additionally, we employ the Sinkhorn algorithm and a sparsity-enforced transport plan. Our empirical findings demonstrate that, with a suitably designed structure, the neural network has the ability to adapt to a suitable level of Lipschitz continuity locally. For reproducibility, our code is available at \url{https://github.com/zcheng-a/LCD_kNN}.
format Preprint
id arxiv_https___arxiv_org_abs_2406_09375
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Learning conditional distributions on continuous spaces
Bénézet, Cyril
Cheng, Ziteng
Jaimungal, Sebastian
Machine Learning
Statistics Theory
We investigate sample-based learning of conditional distributions on multi-dimensional unit boxes, allowing for different dimensions of the feature and target spaces. Our approach involves clustering data near varying query points in the feature space to create empirical measures in the target space. We employ two distinct clustering schemes: one based on a fixed-radius ball and the other on nearest neighbors. We establish upper bounds for the convergence rates of both methods and, from these bounds, deduce optimal configurations for the radius and the number of neighbors. We propose to incorporate the nearest neighbors method into neural network training, as our empirical analysis indicates it has better performance in practice. For efficiency, our training process utilizes approximate nearest neighbors search with random binary space partitioning. Additionally, we employ the Sinkhorn algorithm and a sparsity-enforced transport plan. Our empirical findings demonstrate that, with a suitably designed structure, the neural network has the ability to adapt to a suitable level of Lipschitz continuity locally. For reproducibility, our code is available at \url{https://github.com/zcheng-a/LCD_kNN}.
title Learning conditional distributions on continuous spaces
topic Machine Learning
Statistics Theory
url https://arxiv.org/abs/2406.09375