Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Etezadi, Fatemeh, Ito, Shunichi, Yasui, Kosuke, Abdalkader, Rodi Kado, Minami, Itsunari, Uesugi, Motonari, Namasivayam, Ganesh Pandian, Nakano, Haruko, Nakano, Atsushi, Packwood, Daniel M.
Format: Preprint
Veröffentlicht: 2024
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2407.15322
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866916331601788928
author Etezadi, Fatemeh
Ito, Shunichi
Yasui, Kosuke
Abdalkader, Rodi Kado
Minami, Itsunari
Uesugi, Motonari
Namasivayam, Ganesh Pandian
Nakano, Haruko
Nakano, Atsushi
Packwood, Daniel M.
author_facet Etezadi, Fatemeh
Ito, Shunichi
Yasui, Kosuke
Abdalkader, Rodi Kado
Minami, Itsunari
Uesugi, Motonari
Namasivayam, Ganesh Pandian
Nakano, Haruko
Nakano, Atsushi
Packwood, Daniel M.
contents The discovery of small organic compounds for inducing stem cell differentiation is a time- and resource-intensive process. While data science could, in principle, facilitate the discovery of these compounds, novel approaches are required due to the difficulty of acquiring training data from large numbers of example compounds. In this paper, we demonstrate the design of a new compound for inducing cardiomyocyte differentiation using simple regression models trained with a data set containing only 80 examples. We introduce decorated shape descriptors, an information-rich molecular feature representation that integrates both molecular shape and hydrophilicity information. These models demonstrate improved performance compared to ones using standard molecular descriptors based on shape alone. Model overtraining is diagnosed using a new type of sensitivity analysis. Our new compound is designed using a conservative molecular design strategy, and its effectiveness is confirmed through expression profiles of cardiomyocyte-related marker genes using real-time polymerase chain reaction experiments on human iPS cell lines. This work demonstrates a viable data-driven strategy for designing new compounds for stem cell differentiation protocols and will be useful in situations where training data is limited.
format Preprint
id arxiv_https___arxiv_org_abs_2407_15322
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Molecular design for cardiac cell differentiation using a small dataset and decorated shape features
Etezadi, Fatemeh
Ito, Shunichi
Yasui, Kosuke
Abdalkader, Rodi Kado
Minami, Itsunari
Uesugi, Motonari
Namasivayam, Ganesh Pandian
Nakano, Haruko
Nakano, Atsushi
Packwood, Daniel M.
Biomolecules
The discovery of small organic compounds for inducing stem cell differentiation is a time- and resource-intensive process. While data science could, in principle, facilitate the discovery of these compounds, novel approaches are required due to the difficulty of acquiring training data from large numbers of example compounds. In this paper, we demonstrate the design of a new compound for inducing cardiomyocyte differentiation using simple regression models trained with a data set containing only 80 examples. We introduce decorated shape descriptors, an information-rich molecular feature representation that integrates both molecular shape and hydrophilicity information. These models demonstrate improved performance compared to ones using standard molecular descriptors based on shape alone. Model overtraining is diagnosed using a new type of sensitivity analysis. Our new compound is designed using a conservative molecular design strategy, and its effectiveness is confirmed through expression profiles of cardiomyocyte-related marker genes using real-time polymerase chain reaction experiments on human iPS cell lines. This work demonstrates a viable data-driven strategy for designing new compounds for stem cell differentiation protocols and will be useful in situations where training data is limited.
title Molecular design for cardiac cell differentiation using a small dataset and decorated shape features
topic Biomolecules
url https://arxiv.org/abs/2407.15322