Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kim, Sangyeup, Kim, Nayeon, Piao, Yinhua, Kim, Sun
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2503.07655
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917950157488128
author	Kim, Sangyeup Kim, Nayeon Piao, Yinhua Kim, Sun
author_facet	Kim, Sangyeup Kim, Nayeon Piao, Yinhua Kim, Sun
contents	Molecular language modeling tasks such as molecule captioning have been recognized for their potential to further understand molecular properties that can aid drug discovery or material synthesis based on chemical reactions. Unlike the common use of molecule graphs in predicting molecular properties, most methods in molecular language modeling rely heavily on SMILES sequences. This preference is because the task involves generating a sequence of multiple tokens using transformer-based models. Therefore, a main challenge is determining how to integrate graph data, which contains structural and spatial information about molecules, with text data. In addition, simply using both 1D SMILES text and 2D graph as inputs without addressing how they align and represent the molecule structure in different modalities makes it challenging to fully utilize structural knowledge about molecules. To this end, we propose GraphT5, a multi-modal framework that integrates 1D SMILES text and 2D graph representations of molecules for molecular language modeling. Specifically, we introduce a novel cross-token attention module in GraphT5 to bridge the gap arising from the fundamental differences between the two modalities of molecule representations. Cross-token attention exploits implicit information between SMILES and graphs of molecules, resulting from their interactions at a fine-grained token level that benefits molecular language modeling. Extensive experiments including molecule captioning, IUPAC name prediction tasks, and case studies show that our GraphT5 outperforms the latest baseline approaches, which validates the effectiveness of our GraphT5 in sufficiently utilizing 1D SMILES text and 2D graph representations.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_07655
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	GraphT5: Unified Molecular Graph-Language Modeling via Multi-Modal Cross-Token Attention Kim, Sangyeup Kim, Nayeon Piao, Yinhua Kim, Sun Machine Learning Artificial Intelligence Molecular language modeling tasks such as molecule captioning have been recognized for their potential to further understand molecular properties that can aid drug discovery or material synthesis based on chemical reactions. Unlike the common use of molecule graphs in predicting molecular properties, most methods in molecular language modeling rely heavily on SMILES sequences. This preference is because the task involves generating a sequence of multiple tokens using transformer-based models. Therefore, a main challenge is determining how to integrate graph data, which contains structural and spatial information about molecules, with text data. In addition, simply using both 1D SMILES text and 2D graph as inputs without addressing how they align and represent the molecule structure in different modalities makes it challenging to fully utilize structural knowledge about molecules. To this end, we propose GraphT5, a multi-modal framework that integrates 1D SMILES text and 2D graph representations of molecules for molecular language modeling. Specifically, we introduce a novel cross-token attention module in GraphT5 to bridge the gap arising from the fundamental differences between the two modalities of molecule representations. Cross-token attention exploits implicit information between SMILES and graphs of molecules, resulting from their interactions at a fine-grained token level that benefits molecular language modeling. Extensive experiments including molecule captioning, IUPAC name prediction tasks, and case studies show that our GraphT5 outperforms the latest baseline approaches, which validates the effectiveness of our GraphT5 in sufficiently utilizing 1D SMILES text and 2D graph representations.
title	GraphT5: Unified Molecular Graph-Language Modeling via Multi-Modal Cross-Token Attention
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2503.07655

Similar Items