Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sypetkowski, Maciej, Wenkel, Frederik, Poursafaei, Farimah, Dickson, Nia, Suri, Karush, Fradkin, Philip, Beaini, Dominique
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2404.11568
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913496493457408
author	Sypetkowski, Maciej Wenkel, Frederik Poursafaei, Farimah Dickson, Nia Suri, Karush Fradkin, Philip Beaini, Dominique
author_facet	Sypetkowski, Maciej Wenkel, Frederik Poursafaei, Farimah Dickson, Nia Suri, Karush Fradkin, Philip Beaini, Dominique
contents	Scaling deep learning models has been at the heart of recent revolutions in language modelling and image generation. Practitioners have observed a strong relationship between model size, dataset size, and performance. However, structure-based architectures such as Graph Neural Networks (GNNs) are yet to show the benefits of scale mainly due to the lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures. We address this drawback of GNNs by studying their scaling behavior. Specifically, we analyze message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs. For the first time, we observe that GNNs benefit tremendously from the increasing scale of depth, width, number of molecules, number of labels, and the diversity in the pretraining datasets. We further demonstrate strong finetuning scaling behavior on 38 highly competitive downstream tasks, outclassing previous large models. This gives rise to MolGPS, a new graph foundation model that allows to navigate the chemical space, outperforming the previous state-of-the-arts on 26 out the 38 downstream tasks. We hope that our work paves the way for an era where foundational GNNs drive pharmaceutical drug discovery.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_11568
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	On the Scalability of GNNs for Molecular Graphs Sypetkowski, Maciej Wenkel, Frederik Poursafaei, Farimah Dickson, Nia Suri, Karush Fradkin, Philip Beaini, Dominique Machine Learning Scaling deep learning models has been at the heart of recent revolutions in language modelling and image generation. Practitioners have observed a strong relationship between model size, dataset size, and performance. However, structure-based architectures such as Graph Neural Networks (GNNs) are yet to show the benefits of scale mainly due to the lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures. We address this drawback of GNNs by studying their scaling behavior. Specifically, we analyze message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs. For the first time, we observe that GNNs benefit tremendously from the increasing scale of depth, width, number of molecules, number of labels, and the diversity in the pretraining datasets. We further demonstrate strong finetuning scaling behavior on 38 highly competitive downstream tasks, outclassing previous large models. This gives rise to MolGPS, a new graph foundation model that allows to navigate the chemical space, outperforming the previous state-of-the-arts on 26 out the 38 downstream tasks. We hope that our work paves the way for an era where foundational GNNs drive pharmaceutical drug discovery.
title	On the Scalability of GNNs for Molecular Graphs
topic	Machine Learning
url	https://arxiv.org/abs/2404.11568

Similar Items