Saved in:
Bibliographic Details
Main Authors: Liu, Mengdi, Gao, Zhangyang, Chang, Hong, Li, Stan Z., Shan, Shiguang, Chen, Xilin
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.04684
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916647739064320
author Liu, Mengdi
Gao, Zhangyang
Chang, Hong
Li, Stan Z.
Shan, Shiguang
Chen, Xilin
author_facet Liu, Mengdi
Gao, Zhangyang
Chang, Hong
Li, Stan Z.
Shan, Shiguang
Chen, Xilin
contents Understanding how genes influence phenotype across species is a fundamental challenge in genetic engineering, which will facilitate advances in various fields such as crop breeding, conservation biology, and personalized medicine. However, current phenotype prediction models are limited to individual species and expensive phenotype labeling process, making the genotype-to-phenotype prediction a highly domain-dependent and data-scarce problem. To this end, we suggest taking images as morphological proxies, facilitating cross-species generalization through large-scale multimodal pretraining. We propose the first genotype-to-phenotype diffusion model (G2PDiffusion) that generates morphological images from DNA considering two critical evolutionary signals, i.e., multiple sequence alignments (MSA) and environmental contexts. The model contains three novel components: 1) a MSA retrieval engine that identifies conserved and co-evolutionary patterns; 2) an environment-aware MSA conditional encoder that effectively models complex genotype-environment interactions; and 3) an adaptive phenomic alignment module to improve genotype-phenotype consistency. Extensive experiments show that integrating evolutionary signals with environmental context enriches the model's understanding of phenotype variability across species, thereby offering a valuable and promising exploration into advanced AI-assisted genomic analysis.
format Preprint
id arxiv_https___arxiv_org_abs_2502_04684
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle G2PDiffusion: Cross-Species Genotype-to-Phenotype Prediction via Evolutionary Diffusion
Liu, Mengdi
Gao, Zhangyang
Chang, Hong
Li, Stan Z.
Shan, Shiguang
Chen, Xilin
Machine Learning
Artificial Intelligence
Understanding how genes influence phenotype across species is a fundamental challenge in genetic engineering, which will facilitate advances in various fields such as crop breeding, conservation biology, and personalized medicine. However, current phenotype prediction models are limited to individual species and expensive phenotype labeling process, making the genotype-to-phenotype prediction a highly domain-dependent and data-scarce problem. To this end, we suggest taking images as morphological proxies, facilitating cross-species generalization through large-scale multimodal pretraining. We propose the first genotype-to-phenotype diffusion model (G2PDiffusion) that generates morphological images from DNA considering two critical evolutionary signals, i.e., multiple sequence alignments (MSA) and environmental contexts. The model contains three novel components: 1) a MSA retrieval engine that identifies conserved and co-evolutionary patterns; 2) an environment-aware MSA conditional encoder that effectively models complex genotype-environment interactions; and 3) an adaptive phenomic alignment module to improve genotype-phenotype consistency. Extensive experiments show that integrating evolutionary signals with environmental context enriches the model's understanding of phenotype variability across species, thereby offering a valuable and promising exploration into advanced AI-assisted genomic analysis.
title G2PDiffusion: Cross-Species Genotype-to-Phenotype Prediction via Evolutionary Diffusion
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2502.04684