Saved in:
Bibliographic Details
Main Authors: Chen, Benson, Sultan, Mohammad M., Karaletsos, Theofanis
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2310.13769
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914676997095424
author Chen, Benson
Sultan, Mohammad M.
Karaletsos, Theofanis
author_facet Chen, Benson
Sultan, Mohammad M.
Karaletsos, Theofanis
contents DNA-Encoded Library (DEL) has proven to be a powerful tool that utilizes combinatorially constructed small molecules to facilitate highly-efficient screening assays. These selection experiments, involving multiple stages of washing, elution, and identification of potent binders via unique DNA barcodes, often generate complex data. This complexity can potentially mask the underlying signals, necessitating the application of computational tools such as machine learning to uncover valuable insights. We introduce a compositional deep probabilistic model of DEL data, DEL-Compose, which decomposes molecular representations into their mono-synthon, di-synthon, and tri-synthon building blocks and capitalizes on the inherent hierarchical structure of these molecules by modeling latent reactions between embedded synthons. Additionally, we investigate methods to improve the observation models for DEL count data such as integrating covariate factors to more effectively account for data noise. Across two popular public benchmark datasets (CA-IX and HRP), our model demonstrates strong performance compared to count baselines, enriches the correct pharmacophores, and offers valuable insights via its intrinsic interpretable structure, thereby providing a robust tool for the analysis of DEL data.
format Preprint
id arxiv_https___arxiv_org_abs_2310_13769
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Compositional Deep Probabilistic Models of DNA Encoded Libraries
Chen, Benson
Sultan, Mohammad M.
Karaletsos, Theofanis
Quantitative Methods
Machine Learning
DNA-Encoded Library (DEL) has proven to be a powerful tool that utilizes combinatorially constructed small molecules to facilitate highly-efficient screening assays. These selection experiments, involving multiple stages of washing, elution, and identification of potent binders via unique DNA barcodes, often generate complex data. This complexity can potentially mask the underlying signals, necessitating the application of computational tools such as machine learning to uncover valuable insights. We introduce a compositional deep probabilistic model of DEL data, DEL-Compose, which decomposes molecular representations into their mono-synthon, di-synthon, and tri-synthon building blocks and capitalizes on the inherent hierarchical structure of these molecules by modeling latent reactions between embedded synthons. Additionally, we investigate methods to improve the observation models for DEL count data such as integrating covariate factors to more effectively account for data noise. Across two popular public benchmark datasets (CA-IX and HRP), our model demonstrates strong performance compared to count baselines, enriches the correct pharmacophores, and offers valuable insights via its intrinsic interpretable structure, thereby providing a robust tool for the analysis of DEL data.
title Compositional Deep Probabilistic Models of DNA Encoded Libraries
topic Quantitative Methods
Machine Learning
url https://arxiv.org/abs/2310.13769