Saved in:
Bibliographic Details
Main Authors: Xiong, Yida, Chen, Jiameng, Li, Kun, Zhang, Hongzhi, Cai, Xiantao, Wu, Jia, Hu, Wenbin
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.10211
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918296351145984
author Xiong, Yida
Chen, Jiameng
Li, Kun
Zhang, Hongzhi
Cai, Xiantao
Wu, Jia
Hu, Wenbin
author_facet Xiong, Yida
Chen, Jiameng
Li, Kun
Zhang, Hongzhi
Cai, Xiantao
Wu, Jia
Hu, Wenbin
contents Molecular graph generation (MGG) is essentially a multi-class generative task, aimed at predicting categories of atoms and bonds under strict chemical and structural constraints. However, many prevailing diffusion paradigms learn to regress numerical embeddings and rely on a hard discretization rule during sampling to recover discrete labels. This introduces a fundamental discrepancy between training and sampling. While models are trained for point-wise numerical fidelity, the sampling process fundamentally relies on crossing categorical decision boundaries. This discrepancy forces the model to expend efforts on intra-class variations that become irrelevant after discretization, ultimately compromising diversity, structural statistics, and generalization performance. Therefore, we propose TopBF, a unified framework that (i) performs MGG directly in continuous parameter distributions, (ii) learns graph-topological understanding through a Quasi-Wasserstein optimal-transport coupling under geodesic costs, and (iii) supports controllable, property-conditioned generation during sampling without retraining the base model. TopBF innovatively employs cumulative distribution function (CDF) to compute category probabilities induced by the Gaussian channel, thereby unifying the training objective with the sampling discretization operation. Experiments on QM9 and ZINC250k demonstrate superior structural fidelity and efficient generation with improved performance.
format Preprint
id arxiv_https___arxiv_org_abs_2510_10211
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Transport-Coupled Bayesian Flows for Molecular Graph Generation
Xiong, Yida
Chen, Jiameng
Li, Kun
Zhang, Hongzhi
Cai, Xiantao
Wu, Jia
Hu, Wenbin
Machine Learning
Molecular graph generation (MGG) is essentially a multi-class generative task, aimed at predicting categories of atoms and bonds under strict chemical and structural constraints. However, many prevailing diffusion paradigms learn to regress numerical embeddings and rely on a hard discretization rule during sampling to recover discrete labels. This introduces a fundamental discrepancy between training and sampling. While models are trained for point-wise numerical fidelity, the sampling process fundamentally relies on crossing categorical decision boundaries. This discrepancy forces the model to expend efforts on intra-class variations that become irrelevant after discretization, ultimately compromising diversity, structural statistics, and generalization performance. Therefore, we propose TopBF, a unified framework that (i) performs MGG directly in continuous parameter distributions, (ii) learns graph-topological understanding through a Quasi-Wasserstein optimal-transport coupling under geodesic costs, and (iii) supports controllable, property-conditioned generation during sampling without retraining the base model. TopBF innovatively employs cumulative distribution function (CDF) to compute category probabilities induced by the Gaussian channel, thereby unifying the training objective with the sampling discretization operation. Experiments on QM9 and ZINC250k demonstrate superior structural fidelity and efficient generation with improved performance.
title Transport-Coupled Bayesian Flows for Molecular Graph Generation
topic Machine Learning
url https://arxiv.org/abs/2510.10211