Saved in:
Bibliographic Details
Main Authors: Maisuradze, Luka, King, Megan C., Surovtsev, Ivan V., Mochrie, Simon G. J., Shattuck, Mark D., O'Hern, Corey S.
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2312.14342
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929428168179712
author Maisuradze, Luka
King, Megan C.
Surovtsev, Ivan V.
Mochrie, Simon G. J.
Shattuck, Mark D.
O'Hern, Corey S.
author_facet Maisuradze, Luka
King, Megan C.
Surovtsev, Ivan V.
Mochrie, Simon G. J.
Shattuck, Mark D.
O'Hern, Corey S.
contents Chromatin is a polymer complex of DNA and proteins that regulates gene expression. The three-dimensional structure and organization of chromatin controls DNA transcription and replication. High-throughput chromatin conformation capture techniques generate Hi-C maps that can provide insight into the 3D structure of chromatin. Hi-C maps can be represented as a symmetric matrix where each element represents the average contact probability or number of contacts between two chromatin loci. Previous studies have detected topologically associating domains (TADs), or self-interacting regions in Hi-C maps within which the contact probability is greater than that outside the region. Many algorithms have been developed to identify TADs within Hi-C maps. However, most TAD identification algorithms are unable to identify nested or overlapping TADs and for a given Hi-C map there is significant variation in the location and number of TADs identified by different methods. We develop a novel method, KerTAD, using a kernel-based technique from computer vision and image processing that is able to accurately identify nested and overlapping TADs. We benchmark this method against state-of-the-art TAD identification methods on both synthetic and experimental data sets. We find that KerTAD consistently has higher true positive rates (TPR) and lower false discovery rates (FDR) than all tested methods for both synthetic and manually annotated experimental Hi-C maps. The TPR for KerTAD is also largely insensitive to increasing noise and sparsity, in contrast to the other methods. We also find that KerTAD is consistent in the number and size of TADs identified across replicate experimental Hi-C maps for several organisms. KerTAD will improve automated TAD identification and enable researchers to better correlate changes in TADs to biological phenomena, such as enhancer-promoter interactions and disease states.
format Preprint
id arxiv_https___arxiv_org_abs_2312_14342
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Identifying topologically associating domains using differential kernels
Maisuradze, Luka
King, Megan C.
Surovtsev, Ivan V.
Mochrie, Simon G. J.
Shattuck, Mark D.
O'Hern, Corey S.
Genomics
Chromatin is a polymer complex of DNA and proteins that regulates gene expression. The three-dimensional structure and organization of chromatin controls DNA transcription and replication. High-throughput chromatin conformation capture techniques generate Hi-C maps that can provide insight into the 3D structure of chromatin. Hi-C maps can be represented as a symmetric matrix where each element represents the average contact probability or number of contacts between two chromatin loci. Previous studies have detected topologically associating domains (TADs), or self-interacting regions in Hi-C maps within which the contact probability is greater than that outside the region. Many algorithms have been developed to identify TADs within Hi-C maps. However, most TAD identification algorithms are unable to identify nested or overlapping TADs and for a given Hi-C map there is significant variation in the location and number of TADs identified by different methods. We develop a novel method, KerTAD, using a kernel-based technique from computer vision and image processing that is able to accurately identify nested and overlapping TADs. We benchmark this method against state-of-the-art TAD identification methods on both synthetic and experimental data sets. We find that KerTAD consistently has higher true positive rates (TPR) and lower false discovery rates (FDR) than all tested methods for both synthetic and manually annotated experimental Hi-C maps. The TPR for KerTAD is also largely insensitive to increasing noise and sparsity, in contrast to the other methods. We also find that KerTAD is consistent in the number and size of TADs identified across replicate experimental Hi-C maps for several organisms. KerTAD will improve automated TAD identification and enable researchers to better correlate changes in TADs to biological phenomena, such as enhancer-promoter interactions and disease states.
title Identifying topologically associating domains using differential kernels
topic Genomics
url https://arxiv.org/abs/2312.14342