Saved in:
Bibliographic Details
Main Authors: Song, Zhenqiao, Zhao, Yunlong, Shi, Wenxian, Jin, Wengong, Yang, Yang, Li, Lei
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2405.08205
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912935336476672
author Song, Zhenqiao
Zhao, Yunlong
Shi, Wenxian
Jin, Wengong
Yang, Yang
Li, Lei
author_facet Song, Zhenqiao
Zhao, Yunlong
Shi, Wenxian
Jin, Wengong
Yang, Yang
Li, Lei
contents Enzymes are genetically encoded biocatalysts capable of accelerating chemical reactions. How can we automatically design functional enzymes? In this paper, we propose EnzyGen, an approach to learn a unified model to design enzymes across all functional families. Our key idea is to generate an enzyme's amino acid sequence and their three-dimensional (3D) coordinates based on functionally important sites and substrates corresponding to a desired catalytic function. These sites are automatically mined from enzyme databases. EnzyGen consists of a novel interleaving network of attention and neighborhood equivariant layers, which captures both long-range correlation in an entire protein sequence and local influence from nearest amino acids in 3D space. To learn the generative model, we devise a joint training objective, including a sequence generation loss, a position prediction loss and an enzyme-substrate interaction loss. We further construct EnzyBench, a dataset with 3157 enzyme families, covering all available enzymes within the protein data bank (PDB). Experimental results show that our EnzyGen consistently achieves the best performance across all 323 testing families, surpassing the best baseline by 10.79% in terms of substrate binding affinity. These findings demonstrate EnzyGen's superior capability in designing well-folded and effective enzymes binding to specific substrates with high affinities.
format Preprint
id arxiv_https___arxiv_org_abs_2405_08205
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates
Song, Zhenqiao
Zhao, Yunlong
Shi, Wenxian
Jin, Wengong
Yang, Yang
Li, Lei
Machine Learning
Enzymes are genetically encoded biocatalysts capable of accelerating chemical reactions. How can we automatically design functional enzymes? In this paper, we propose EnzyGen, an approach to learn a unified model to design enzymes across all functional families. Our key idea is to generate an enzyme's amino acid sequence and their three-dimensional (3D) coordinates based on functionally important sites and substrates corresponding to a desired catalytic function. These sites are automatically mined from enzyme databases. EnzyGen consists of a novel interleaving network of attention and neighborhood equivariant layers, which captures both long-range correlation in an entire protein sequence and local influence from nearest amino acids in 3D space. To learn the generative model, we devise a joint training objective, including a sequence generation loss, a position prediction loss and an enzyme-substrate interaction loss. We further construct EnzyBench, a dataset with 3157 enzyme families, covering all available enzymes within the protein data bank (PDB). Experimental results show that our EnzyGen consistently achieves the best performance across all 323 testing families, surpassing the best baseline by 10.79% in terms of substrate binding affinity. These findings demonstrate EnzyGen's superior capability in designing well-folded and effective enzymes binding to specific substrates with high affinities.
title Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates
topic Machine Learning
url https://arxiv.org/abs/2405.08205