Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Zhang, Zhou, Cao, Hanqun, Tan, Cheng, Wu, Fang, Heng, Pheng Ann, Fu, Tianfan
Format:	Preprint
Publié:	2026
Sujets:	Machine Learning
Accès en ligne:	https://arxiv.org/abs/2603.19636
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866914410245652480
author	Zhang, Zhou Cao, Hanqun Tan, Cheng Wu, Fang Heng, Pheng Ann Fu, Tianfan
author_facet	Zhang, Zhou Cao, Hanqun Tan, Cheng Wu, Fang Heng, Pheng Ann Fu, Tianfan
contents	Accurate RNA structure modeling remains difficult because RNA backbones are highly flexible, non-canonical interactions are prevalent, and experimentally determined 3D structures are comparatively scarce. We introduce \emph{RiboSphere}, a framework that learns \emph{discrete} geometric representations of RNA by combining vector quantization with flow matching. Our design is motivated by the modular organization of RNA architecture: complex folds are composed from recurring structural motifs. RiboSphere uses a geometric transformer encoder to produce SE(3)-invariant (rotation/translation-invariant) features, which are discretized with finite scalar quantization (FSQ) into a finite vocabulary of latent codes. Conditioned on these discrete codes, a flow-matching decoder reconstructs atomic coordinates, enabling high-fidelity structure generation. We find that the learned code indices are enriched for specific RNA motifs, suggesting that the model captures motif-level compositional structure rather than acting as a purely compressive bottleneck. Across benchmarks, RiboSphere achieves strong performance in structure reconstruction (RMSD 1.25\,Å, TM-score 0.84), and its pretrained discrete representations transfer effectively to inverse folding and RNA--ligand binding prediction, with robust generalization in data-scarce regimes.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_19636
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	RiboSphere: Learning Unified and Efficient Representations of RNA Structures Zhang, Zhou Cao, Hanqun Tan, Cheng Wu, Fang Heng, Pheng Ann Fu, Tianfan Machine Learning Accurate RNA structure modeling remains difficult because RNA backbones are highly flexible, non-canonical interactions are prevalent, and experimentally determined 3D structures are comparatively scarce. We introduce \emph{RiboSphere}, a framework that learns \emph{discrete} geometric representations of RNA by combining vector quantization with flow matching. Our design is motivated by the modular organization of RNA architecture: complex folds are composed from recurring structural motifs. RiboSphere uses a geometric transformer encoder to produce SE(3)-invariant (rotation/translation-invariant) features, which are discretized with finite scalar quantization (FSQ) into a finite vocabulary of latent codes. Conditioned on these discrete codes, a flow-matching decoder reconstructs atomic coordinates, enabling high-fidelity structure generation. We find that the learned code indices are enriched for specific RNA motifs, suggesting that the model captures motif-level compositional structure rather than acting as a purely compressive bottleneck. Across benchmarks, RiboSphere achieves strong performance in structure reconstruction (RMSD 1.25\,Å, TM-score 0.84), and its pretrained discrete representations transfer effectively to inverse folding and RNA--ligand binding prediction, with robust generalization in data-scarce regimes.
title	RiboSphere: Learning Unified and Efficient Representations of RNA Structures
topic	Machine Learning
url	https://arxiv.org/abs/2603.19636

Documents similaires