Saved in:
Bibliographic Details
Main Authors: Carracedo-Cosme, Jaime, Romero-Muñiz, Carlos, Pou, Pablo, Pérez, Rubén
Format: Preprint
Published: 2022
Subjects:
Online Access:https://arxiv.org/abs/2205.00449
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916926103486464
author Carracedo-Cosme, Jaime
Romero-Muñiz, Carlos
Pou, Pablo
Pérez, Rubén
author_facet Carracedo-Cosme, Jaime
Romero-Muñiz, Carlos
Pou, Pablo
Pérez, Rubén
contents Despite being the main tool to visualize molecules at the atomic scale, AFM with CO-functionalized metal tips is unable to chemically identify the observed molecules. Here we present a strategy to address this challenging task using deep learning techniques. Instead of identifying a finite number of molecules following a traditional classification approach, we define the molecular identification as an image captioning problem. We design an architecture, composed of two multimodal recurrent neural networks, capable of identifying the structure and composition of an unknown molecule using a 3D-AFM image stack as input. The neural network is trained to provide the name of each molecule according to the IUPAC nomenclature rules. To train and test this algorithm we use the novel QUAM-AFM dataset, which contains almost 700,000 molecules and 165 million AFM images. The accuracy of the predictions is remarkable, achieving a high score quantified by the cumulative BLEU 4-gram, a common metric in language recognition studies.
format Preprint
id arxiv_https___arxiv_org_abs_2205_00449
institution arXiv
publishDate 2022
record_format arxiv
spellingShingle Molecular Identification from AFM images using the IUPAC Nomenclature and Attribute Multimodal Recurrent Neural Networks
Carracedo-Cosme, Jaime
Romero-Muñiz, Carlos
Pou, Pablo
Pérez, Rubén
Materials Science
Disordered Systems and Neural Networks
Machine Learning
Despite being the main tool to visualize molecules at the atomic scale, AFM with CO-functionalized metal tips is unable to chemically identify the observed molecules. Here we present a strategy to address this challenging task using deep learning techniques. Instead of identifying a finite number of molecules following a traditional classification approach, we define the molecular identification as an image captioning problem. We design an architecture, composed of two multimodal recurrent neural networks, capable of identifying the structure and composition of an unknown molecule using a 3D-AFM image stack as input. The neural network is trained to provide the name of each molecule according to the IUPAC nomenclature rules. To train and test this algorithm we use the novel QUAM-AFM dataset, which contains almost 700,000 molecules and 165 million AFM images. The accuracy of the predictions is remarkable, achieving a high score quantified by the cumulative BLEU 4-gram, a common metric in language recognition studies.
title Molecular Identification from AFM images using the IUPAC Nomenclature and Attribute Multimodal Recurrent Neural Networks
topic Materials Science
Disordered Systems and Neural Networks
Machine Learning
url https://arxiv.org/abs/2205.00449