Saved in:
Bibliographic Details
Main Authors: Zhou, Cai, Chen, Zijie, Li, Zian, Wang, Jike, Jiang, Kaiyi, Li, Pan, Yu, Rose, Zhang, Muhan, Bates, Stephen, Jaakkola, Tommi
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.15022
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912907998003200
author Zhou, Cai
Chen, Zijie
Li, Zian
Wang, Jike
Jiang, Kaiyi
Li, Pan
Yu, Rose
Zhang, Muhan
Bates, Stephen
Jaakkola, Tommi
author_facet Zhou, Cai
Chen, Zijie
Li, Zian
Wang, Jike
Jiang, Kaiyi
Li, Pan
Yu, Rose
Zhang, Muhan
Bates, Stephen
Jaakkola, Tommi
contents Many generative tasks in chemistry and science involve distributions invariant to group symmetries (e.g., permutation and rotation). A common strategy enforces invariance and equivariance through architectural constraints such as equivariant denoisers and invariant priors. In this paper, we challenge this tradition through the alternative canonicalization perspective: first map each sample to an orbit representative with a canonical pose or order, train an unconstrained (non-equivariant) diffusion or flow model on the canonical slice, and finally recover the invariant distribution by sampling a random symmetry transform at generation time. Building on a formal quotient-space perspective, our work provides a comprehensive theory of canonical diffusion by proving: (i) the correctness, universality and superior expressivity of canonical generative models over invariant targets; (ii) canonicalization accelerates training by removing diffusion score complexity induced by group mixtures and reducing conditional variance in flow matching. We then show that aligned priors and optimal transport act complementarily with canonicalization and further improves training efficiency. We instantiate the framework for molecular graph generation under $S_n \times SE(3)$ symmetries. By leveraging geometric spectra-based canonicalization and mild positional encodings, canonical diffusion significantly outperforms equivariant baselines in 3D molecule generation tasks, with similar or even less computation. Moreover, with a novel architecture Canon, CanonFlow achieves state-of-the-art performance on the challenging GEOM-DRUG dataset, and the advantage remains large in few-step generation.
format Preprint
id arxiv_https___arxiv_org_abs_2602_15022
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Rethinking Diffusion Models with Symmetries through Canonicalization with Applications to Molecular Graph Generation
Zhou, Cai
Chen, Zijie
Li, Zian
Wang, Jike
Jiang, Kaiyi
Li, Pan
Yu, Rose
Zhang, Muhan
Bates, Stephen
Jaakkola, Tommi
Machine Learning
Artificial Intelligence
Group Theory
Biomolecules
Many generative tasks in chemistry and science involve distributions invariant to group symmetries (e.g., permutation and rotation). A common strategy enforces invariance and equivariance through architectural constraints such as equivariant denoisers and invariant priors. In this paper, we challenge this tradition through the alternative canonicalization perspective: first map each sample to an orbit representative with a canonical pose or order, train an unconstrained (non-equivariant) diffusion or flow model on the canonical slice, and finally recover the invariant distribution by sampling a random symmetry transform at generation time. Building on a formal quotient-space perspective, our work provides a comprehensive theory of canonical diffusion by proving: (i) the correctness, universality and superior expressivity of canonical generative models over invariant targets; (ii) canonicalization accelerates training by removing diffusion score complexity induced by group mixtures and reducing conditional variance in flow matching. We then show that aligned priors and optimal transport act complementarily with canonicalization and further improves training efficiency. We instantiate the framework for molecular graph generation under $S_n \times SE(3)$ symmetries. By leveraging geometric spectra-based canonicalization and mild positional encodings, canonical diffusion significantly outperforms equivariant baselines in 3D molecule generation tasks, with similar or even less computation. Moreover, with a novel architecture Canon, CanonFlow achieves state-of-the-art performance on the challenging GEOM-DRUG dataset, and the advantage remains large in few-step generation.
title Rethinking Diffusion Models with Symmetries through Canonicalization with Applications to Molecular Graph Generation
topic Machine Learning
Artificial Intelligence
Group Theory
Biomolecules
url https://arxiv.org/abs/2602.15022