Saved in:
Bibliographic Details
Main Authors: Zhou, Tianshuo, Tang, Wei Yu, Malik, Apoorv, Mathews, David H., Huang, Liang
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.17206
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913817599934464
author Zhou, Tianshuo
Tang, Wei Yu
Malik, Apoorv
Mathews, David H.
Huang, Liang
author_facet Zhou, Tianshuo
Tang, Wei Yu
Malik, Apoorv
Mathews, David H.
Huang, Liang
contents RNA design aims to find a sequence that folds with highest probability into a designated target structure. However, certain structures are undesignable, meaning no sequence can fold into the target structure under the default (Turner) RNA folding model. Understanding the specific local structures (i.e., "motifs") that contribute to undesignability is crucial for refining RNA folding models and determining the limits of RNA designability. Despite its importance, this problem has received very little attention, and previous efforts are neither scalable nor interpretable. We develop a new theoretical framework for motif (un-)designability, and design scalable and interpretable algorithms to identify minimal undesignable motifs within a given RNA secondary structure. Our approach establishes motif undesignability by searching for rival motifs, rather than exhaustively enumerating all (partial) sequences that could potentially fold into the motif. Furthermore, we exploit rotational invariance in RNA structures to detect, group, and reuse equivalent motifs and to construct a database of unique minimal undesignable motifs. To achieve that, we propose a loop-pair graph representation for motifs and a recursive graph isomorphism algorithm for motif equivalence. Our algorithms successfully identify 24 unique minimal undesignable motifs among 18 undesignable puzzles from the Eterna100 benchmark. Surprisingly, we also find over 350 unique minimal undesignable motifs and 663 undesignable native structures in the ArchiveII dataset, drawn from a diverse set of RNA families. Our source code is available at https://github.com/shanry/RNA-Undesign and our web server is available at http://linearfold.org/motifs.
format Preprint
id arxiv_https___arxiv_org_abs_2402_17206
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Scalable and Interpretable Identification of Minimal Undesignable RNA Structure Motifs with Rotational Invariance
Zhou, Tianshuo
Tang, Wei Yu
Malik, Apoorv
Mathews, David H.
Huang, Liang
Data Structures and Algorithms
RNA design aims to find a sequence that folds with highest probability into a designated target structure. However, certain structures are undesignable, meaning no sequence can fold into the target structure under the default (Turner) RNA folding model. Understanding the specific local structures (i.e., "motifs") that contribute to undesignability is crucial for refining RNA folding models and determining the limits of RNA designability. Despite its importance, this problem has received very little attention, and previous efforts are neither scalable nor interpretable. We develop a new theoretical framework for motif (un-)designability, and design scalable and interpretable algorithms to identify minimal undesignable motifs within a given RNA secondary structure. Our approach establishes motif undesignability by searching for rival motifs, rather than exhaustively enumerating all (partial) sequences that could potentially fold into the motif. Furthermore, we exploit rotational invariance in RNA structures to detect, group, and reuse equivalent motifs and to construct a database of unique minimal undesignable motifs. To achieve that, we propose a loop-pair graph representation for motifs and a recursive graph isomorphism algorithm for motif equivalence. Our algorithms successfully identify 24 unique minimal undesignable motifs among 18 undesignable puzzles from the Eterna100 benchmark. Surprisingly, we also find over 350 unique minimal undesignable motifs and 663 undesignable native structures in the ArchiveII dataset, drawn from a diverse set of RNA families. Our source code is available at https://github.com/shanry/RNA-Undesign and our web server is available at http://linearfold.org/motifs.
title Scalable and Interpretable Identification of Minimal Undesignable RNA Structure Motifs with Rotational Invariance
topic Data Structures and Algorithms
url https://arxiv.org/abs/2402.17206