Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Han, Jiaqi, Wang, Austin, Xu, Minkai, Chu, Wenda, Dang, Meihua, Ye, Haotian, Chen, Huayu, Yue, Yisong, Ermon, Stefano
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2507.04832
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917237525315584
author	Han, Jiaqi Wang, Austin Xu, Minkai Chu, Wenda Dang, Meihua Ye, Haotian Chen, Huayu Yue, Yisong Ermon, Stefano
author_facet	Han, Jiaqi Wang, Austin Xu, Minkai Chu, Wenda Dang, Meihua Ye, Haotian Chen, Huayu Yue, Yisong Ermon, Stefano
contents	Discrete diffusion models have demonstrated great promise in modeling various sequence data, ranging from human language to biological sequences. Inspired by the success of RL in language models, there is growing interest in further improving the models by alignment with a certain reward. In this work, we propose an offline preference optimization method to approach trajectory alignment for discrete diffusion models. Instead of applying the reward on the final output and backpropagating the gradient to the entire denoising process, we decompose the problem into a set of stepwise alignment objectives by matching the per-step posterior. This framework enables efficient diffusion optimization, is compatible with arbitrary reward functions, and importantly, yields an equivalent optimal solution under additive factorization of the trajectory reward. Experiments across multiple domains including DNA sequence design, protein inverse folding, and language modeling consistently demonstrate the superiority of our approach. Notably, it achieves an up to 12\% improvement over the most competitive RL-based baseline in terms of predicted activity on DNA sequence design, and further improves the GSM8K score from 78.6 to 81.2 on LLaDA-8B-Instruct for language modeling.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_04832
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Discrete Diffusion Trajectory Alignment via Stepwise Decomposition Han, Jiaqi Wang, Austin Xu, Minkai Chu, Wenda Dang, Meihua Ye, Haotian Chen, Huayu Yue, Yisong Ermon, Stefano Machine Learning Discrete diffusion models have demonstrated great promise in modeling various sequence data, ranging from human language to biological sequences. Inspired by the success of RL in language models, there is growing interest in further improving the models by alignment with a certain reward. In this work, we propose an offline preference optimization method to approach trajectory alignment for discrete diffusion models. Instead of applying the reward on the final output and backpropagating the gradient to the entire denoising process, we decompose the problem into a set of stepwise alignment objectives by matching the per-step posterior. This framework enables efficient diffusion optimization, is compatible with arbitrary reward functions, and importantly, yields an equivalent optimal solution under additive factorization of the trajectory reward. Experiments across multiple domains including DNA sequence design, protein inverse folding, and language modeling consistently demonstrate the superiority of our approach. Notably, it achieves an up to 12\% improvement over the most competitive RL-based baseline in terms of predicted activity on DNA sequence design, and further improves the GSM8K score from 78.6 to 81.2 on LLaDA-8B-Instruct for language modeling.
title	Discrete Diffusion Trajectory Alignment via Stepwise Decomposition
topic	Machine Learning
url	https://arxiv.org/abs/2507.04832

Similar Items