Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	C., Simo Alami, Kaddah, Rim, Read, Jesse, Cani, Marie-Paule
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Machine Learning Optimization and Control
Online Access:	https://arxiv.org/abs/2505.04310
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913089745584128
author	C., Simo Alami Kaddah, Rim Read, Jesse Cani, Marie-Paule
author_facet	C., Simo Alami Kaddah, Rim Read, Jesse Cani, Marie-Paule
contents	Distributional Reinforcement Learning (DistRL) improves upon expectation-based methods by modeling full return distributions, but standard approaches often remain far from parsimonious. Categorical methods (e.g., C51) rely on fixed supports where parameter counts scale linearly with resolution, while quantile methods approximate distributions as discrete mixtures whose piecewise-constant densities can be wasteful when modeling complex multi-modal or heavy-tailed returns. We introduce NFDRL, a parsimonious architecture that models return distributions using continuous normalizing flows. Unlike categorical baselines, our flow-based model maintains a compact parameter footprint that does not grow with the effective resolution of the distribution, while providing a dynamic, adaptive support for returns. To train this continuous representation, we propose a Cramér-inspired, geometry-aware distance defined over probability masses obtained from the flow. We show that this distance is a true probability metric, that the associated distributional Bellman operator is a sqrt(gamma)-contraction, and that the resulting objective admits unbiased sample gradients, properties that are typically not simultaneously guaranteed in prior PDF-based DistRL methods. Empirically, NFDRL recovers rich, multi-modal return landscapes on toy MDPs and achieves performance competitive with categorical baselines on the Atari-5 benchmark, while offering substantially better parameter efficiency.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_04310
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Parameter-Efficient Distributional RL via Normalizing Flows and a Geometry-Aware Cramér Surrogate C., Simo Alami Kaddah, Rim Read, Jesse Cani, Marie-Paule Artificial Intelligence Machine Learning Optimization and Control Distributional Reinforcement Learning (DistRL) improves upon expectation-based methods by modeling full return distributions, but standard approaches often remain far from parsimonious. Categorical methods (e.g., C51) rely on fixed supports where parameter counts scale linearly with resolution, while quantile methods approximate distributions as discrete mixtures whose piecewise-constant densities can be wasteful when modeling complex multi-modal or heavy-tailed returns. We introduce NFDRL, a parsimonious architecture that models return distributions using continuous normalizing flows. Unlike categorical baselines, our flow-based model maintains a compact parameter footprint that does not grow with the effective resolution of the distribution, while providing a dynamic, adaptive support for returns. To train this continuous representation, we propose a Cramér-inspired, geometry-aware distance defined over probability masses obtained from the flow. We show that this distance is a true probability metric, that the associated distributional Bellman operator is a sqrt(gamma)-contraction, and that the resulting objective admits unbiased sample gradients, properties that are typically not simultaneously guaranteed in prior PDF-based DistRL methods. Empirically, NFDRL recovers rich, multi-modal return landscapes on toy MDPs and achieves performance competitive with categorical baselines on the Atari-5 benchmark, while offering substantially better parameter efficiency.
title	Parameter-Efficient Distributional RL via Normalizing Flows and a Geometry-Aware Cramér Surrogate
topic	Artificial Intelligence Machine Learning Optimization and Control
url	https://arxiv.org/abs/2505.04310

Similar Items