Saved in:
Bibliographic Details
Main Authors: Bourdrez, Constant, Vérine, Alexandre, Cappé, Olivier
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.08689
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Diffusion models generate samples through an iterative denoising process guided by a pretrained neural network. Once the denoiser is fixed, the sampling algorithm itself (noise schedules, guidance scales, stochasticity profiles) still requires careful tuning, a process typically carried out through costly empirical grid search. In this work, we introduce an inverse reinforcement learning framework for learning sampling strategies without retraining the denoiser. We formulate the diffusion sampling procedure as a discrete-time finite-horizon Markov Decision Process, where actions correspond to optional modifications of the sampling dynamics. To optimize action scheduling, we avoid defining an explicit reward function and instead directly match the target behavior expected from the sampler using policy gradient techniques. We provide experimental evidence that this approach matches fine-tuned samplers and comes at a modest cost compared to grid search: on ImageNet-64, a single training run replaces exhaustive search at up to 9x lower cost, with only 16% overhead at inference.