Saved in:
Bibliographic Details
Main Authors: Han, Jiaqi, Wang, Austin, Xu, Minkai, Chu, Wenda, Dang, Meihua, Ye, Haotian, Chen, Huayu, Yue, Yisong, Ermon, Stefano
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.04832
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917237525315584
author Han, Jiaqi
Wang, Austin
Xu, Minkai
Chu, Wenda
Dang, Meihua
Ye, Haotian
Chen, Huayu
Yue, Yisong
Ermon, Stefano
author_facet Han, Jiaqi
Wang, Austin
Xu, Minkai
Chu, Wenda
Dang, Meihua
Ye, Haotian
Chen, Huayu
Yue, Yisong
Ermon, Stefano
contents Discrete diffusion models have demonstrated great promise in modeling various sequence data, ranging from human language to biological sequences. Inspired by the success of RL in language models, there is growing interest in further improving the models by alignment with a certain reward. In this work, we propose an offline preference optimization method to approach trajectory alignment for discrete diffusion models. Instead of applying the reward on the final output and backpropagating the gradient to the entire denoising process, we decompose the problem into a set of stepwise alignment objectives by matching the per-step posterior. This framework enables efficient diffusion optimization, is compatible with arbitrary reward functions, and importantly, yields an equivalent optimal solution under additive factorization of the trajectory reward. Experiments across multiple domains including DNA sequence design, protein inverse folding, and language modeling consistently demonstrate the superiority of our approach. Notably, it achieves an up to 12\% improvement over the most competitive RL-based baseline in terms of predicted activity on DNA sequence design, and further improves the GSM8K score from 78.6 to 81.2 on LLaDA-8B-Instruct for language modeling.
format Preprint
id arxiv_https___arxiv_org_abs_2507_04832
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Discrete Diffusion Trajectory Alignment via Stepwise Decomposition
Han, Jiaqi
Wang, Austin
Xu, Minkai
Chu, Wenda
Dang, Meihua
Ye, Haotian
Chen, Huayu
Yue, Yisong
Ermon, Stefano
Machine Learning
Discrete diffusion models have demonstrated great promise in modeling various sequence data, ranging from human language to biological sequences. Inspired by the success of RL in language models, there is growing interest in further improving the models by alignment with a certain reward. In this work, we propose an offline preference optimization method to approach trajectory alignment for discrete diffusion models. Instead of applying the reward on the final output and backpropagating the gradient to the entire denoising process, we decompose the problem into a set of stepwise alignment objectives by matching the per-step posterior. This framework enables efficient diffusion optimization, is compatible with arbitrary reward functions, and importantly, yields an equivalent optimal solution under additive factorization of the trajectory reward. Experiments across multiple domains including DNA sequence design, protein inverse folding, and language modeling consistently demonstrate the superiority of our approach. Notably, it achieves an up to 12\% improvement over the most competitive RL-based baseline in terms of predicted activity on DNA sequence design, and further improves the GSM8K score from 78.6 to 81.2 on LLaDA-8B-Instruct for language modeling.
title Discrete Diffusion Trajectory Alignment via Stepwise Decomposition
topic Machine Learning
url https://arxiv.org/abs/2507.04832