Saved in:
Bibliographic Details
Main Authors: Gao, Zhangyang, Dong, Daize, Tan, Cheng, Xia, Jun, Hu, Bozhen, Li, Stan Z.
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.02464
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916264437350400
author Gao, Zhangyang
Dong, Daize
Tan, Cheng
Xia, Jun
Hu, Bozhen
Li, Stan Z.
author_facet Gao, Zhangyang
Dong, Daize
Tan, Cheng
Xia, Jun
Hu, Bozhen
Li, Stan Z.
contents Can we model Non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The Non-Euclidean property have posed a long term challenge in graph modeling. Despite recent graph neural networks and graph transformers efforts encoding graphs as Euclidean vectors, recovering the original graph from vectors remains a challenge. In this paper, we introduce GraphsGPT, featuring an Graph2Seq encoder that transforms Non-Euclidean graphs into learnable Graph Words in the Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from Graph Words to ensure information equivalence. We pretrain GraphsGPT on $100$M molecules and yield some interesting findings: (1) The pretrained Graph2Seq excels in graph representation learning, achieving state-of-the-art results on $8/9$ graph classification and regression tasks. (2) The pretrained GraphGPT serves as a strong graph generator, demonstrated by its strong ability to perform both few-shot and conditional graph generation. (3) Graph2Seq+GraphGPT enables effective graph mixup in the Euclidean space, overcoming previously known Non-Euclidean challenges. (4) The edge-centric pretraining framework GraphsGPT demonstrates its efficacy in graph domain tasks, excelling in both representation and generation. Code is available at \href{https://github.com/A4Bio/GraphsGPT}{GitHub}.
format Preprint
id arxiv_https___arxiv_org_abs_2402_02464
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer
Gao, Zhangyang
Dong, Daize
Tan, Cheng
Xia, Jun
Hu, Bozhen
Li, Stan Z.
Machine Learning
Artificial Intelligence
Social and Information Networks
Can we model Non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The Non-Euclidean property have posed a long term challenge in graph modeling. Despite recent graph neural networks and graph transformers efforts encoding graphs as Euclidean vectors, recovering the original graph from vectors remains a challenge. In this paper, we introduce GraphsGPT, featuring an Graph2Seq encoder that transforms Non-Euclidean graphs into learnable Graph Words in the Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from Graph Words to ensure information equivalence. We pretrain GraphsGPT on $100$M molecules and yield some interesting findings: (1) The pretrained Graph2Seq excels in graph representation learning, achieving state-of-the-art results on $8/9$ graph classification and regression tasks. (2) The pretrained GraphGPT serves as a strong graph generator, demonstrated by its strong ability to perform both few-shot and conditional graph generation. (3) Graph2Seq+GraphGPT enables effective graph mixup in the Euclidean space, overcoming previously known Non-Euclidean challenges. (4) The edge-centric pretraining framework GraphsGPT demonstrates its efficacy in graph domain tasks, excelling in both representation and generation. Code is available at \href{https://github.com/A4Bio/GraphsGPT}{GitHub}.
title A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer
topic Machine Learning
Artificial Intelligence
Social and Information Networks
url https://arxiv.org/abs/2402.02464