Saved in:
Bibliographic Details
Main Authors: Dubey, Neeru, Karlsson, Elin, Redondo, Miguel Angel, Reimegård, Johan, Rising, Anna, Kjellström, Hedvig
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.08437
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914106992230400
author Dubey, Neeru
Karlsson, Elin
Redondo, Miguel Angel
Reimegård, Johan
Rising, Anna
Kjellström, Hedvig
author_facet Dubey, Neeru
Karlsson, Elin
Redondo, Miguel Angel
Reimegård, Johan
Rising, Anna
Kjellström, Hedvig
contents The remarkable mechanical properties of spider silk, including its tensile strength and extensibility, are primarily governed by the repetitive regions of the proteins that constitute the fiber, the major ampullate spidroins (MaSps). However, establishing correlations between mechanical characteristics and repeat sequences is challenging due to the intricate sequence-structure-function relationships of MaSps and the limited availability of annotated datasets. In this study, we present a novel computational framework for designing MaSp repeat sequences with customizable mechanical properties. To achieve this, we developed a lightweight GPT-based generative model by distilling the pre-trained ProtGPT2 protein language model. The distilled model was subjected to multilevel fine-tuning using curated subsets of the Spider Silkome dataset. Specifically, we adapt the model for MaSp repeat generation using 6,000 MaSp repeat sequences and further refine it with 572 repeats associated with experimentally determined fiber-level mechanical properties. Our model generates biologically plausible MaSp repeat regions tailored to specific mechanical properties while also predicting those properties for given sequences. Validation includes sequence-level analysis, assessing physicochemical attributes and expected distribution of key motifs as well as secondary structure compositions. A correlation study using BLAST on the Spider Silkome dataset and a test set of MaSp repeats with known mechanical properties further confirmed the predictive accuracy of the model. This framework advances the rational design of spider silk-inspired biomaterials, offering a versatile tool for engineering protein sequences with tailored mechanical attributes.
format Preprint
id arxiv_https___arxiv_org_abs_2504_08437
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Customizing Spider Silk: Generative Models with Mechanical Property Conditioning for Protein Engineering
Dubey, Neeru
Karlsson, Elin
Redondo, Miguel Angel
Reimegård, Johan
Rising, Anna
Kjellström, Hedvig
Machine Learning
The remarkable mechanical properties of spider silk, including its tensile strength and extensibility, are primarily governed by the repetitive regions of the proteins that constitute the fiber, the major ampullate spidroins (MaSps). However, establishing correlations between mechanical characteristics and repeat sequences is challenging due to the intricate sequence-structure-function relationships of MaSps and the limited availability of annotated datasets. In this study, we present a novel computational framework for designing MaSp repeat sequences with customizable mechanical properties. To achieve this, we developed a lightweight GPT-based generative model by distilling the pre-trained ProtGPT2 protein language model. The distilled model was subjected to multilevel fine-tuning using curated subsets of the Spider Silkome dataset. Specifically, we adapt the model for MaSp repeat generation using 6,000 MaSp repeat sequences and further refine it with 572 repeats associated with experimentally determined fiber-level mechanical properties. Our model generates biologically plausible MaSp repeat regions tailored to specific mechanical properties while also predicting those properties for given sequences. Validation includes sequence-level analysis, assessing physicochemical attributes and expected distribution of key motifs as well as secondary structure compositions. A correlation study using BLAST on the Spider Silkome dataset and a test set of MaSp repeats with known mechanical properties further confirmed the predictive accuracy of the model. This framework advances the rational design of spider silk-inspired biomaterials, offering a versatile tool for engineering protein sequences with tailored mechanical attributes.
title Customizing Spider Silk: Generative Models with Mechanical Property Conditioning for Protein Engineering
topic Machine Learning
url https://arxiv.org/abs/2504.08437