Saved in:
Bibliographic Details
Main Authors: Yiming Zhang, Bian Bian, Manabu Okumura
Format: Artículo Open Access
Published: Wiley 2024
Subjects:
Online Access:https://onlinelibrary.wiley.com/doi/10.1002/imo2.45
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Hyena architecture enables fast and efficient protein language modeling Yiming Zhang Bian Bian Manabu Okumura iMetaOmics AbstractThe emergence of self‐supervised deep language models has revolutionized natural language processing tasks and has recently extended its applications to biological sequence analysis. Traditional language models, primarily based on Transformer architectures, demonstrate substantial effectiveness in various applications. However, these models are inherently constrained by the attention mechanism's quadratic computational complexity, , which limits their efficiency and leads to high computational costs. To address these limitations, we introduce ProtHyena, a novel approach that leverages the Hyena operator in protein language modeling. This innovative methodology alternates between subquadratic long convolutions and element‐wise gating operations, which circumvents the constraints imposed by attention mechanisms and reduces computational complexity to subquadratic levels. This enables faster and more memory‐efficient modeling of protein sequences. ProtHyena can achieve state‐of‐the‐art results and comparable performance in 8 downstream tasks, including protein engineering (protein fluorescence and stability prediction), protein property prediction (neuropeptide cleavage, signal peptide, solubility, disorder, gene function prediction), protein structure prediction, with only 1.6 M parameters. The architecture of ProtHyena represents a highly efficient solution for protein language modeling, offering a promising avenue for fast and efficient analysis of protein sequences. 10.1002/imo2.45 http://creativecommons.org/licenses/by/4.0/