Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wei, Yanbin, Fu, Shuai, Jiang, Weisen, Zhang, Zejian, Zeng, Zhixiong, Wu, Qi, Kwok, James T., Zhang, Yu
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2402.02130
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929569886371840
author	Wei, Yanbin Fu, Shuai Jiang, Weisen Zhang, Zejian Zeng, Zhixiong Wu, Qi Kwok, James T. Zhang, Yu
author_facet	Wei, Yanbin Fu, Shuai Jiang, Weisen Zhang, Zejian Zeng, Zhixiong Wu, Qi Kwok, James T. Zhang, Yu
contents	Large Language Models (LLMs) are increasingly used for various tasks with graph structures. Though LLMs can process graph information in a textual format, they overlook the rich vision modality, which is an intuitive way for humans to comprehend structural information and conduct general graph reasoning. The potential benefits and capabilities of representing graph structures as visual images (i.e., $\textit{visual graph}$) are still unexplored. To fill the gap, we innovatively propose an end-to-end framework, called $\textbf{G}$raph to v$\textbf{I}$sual and $\textbf{T}$extual Integr$\textbf{A}$tion (GITA), which firstly incorporates visual graphs into general graph reasoning. Besides, we establish $\textbf{G}$raph-based $\textbf{V}$ision-$\textbf{L}$anguage $\textbf{Q}$uestion $\textbf{A}$nswering (GVLQA) dataset from existing graph data, which is the first vision-language dataset for general graph reasoning purposes. Extensive experiments on the GVLQA dataset and five real-world datasets show that GITA outperforms mainstream LLMs in terms of general graph reasoning capabilities. Moreover, We highlight the effectiveness of the layout augmentation on visual graphs and pretraining on the GVLQA dataset.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_02130
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning Wei, Yanbin Fu, Shuai Jiang, Weisen Zhang, Zejian Zeng, Zhixiong Wu, Qi Kwok, James T. Zhang, Yu Computation and Language Large Language Models (LLMs) are increasingly used for various tasks with graph structures. Though LLMs can process graph information in a textual format, they overlook the rich vision modality, which is an intuitive way for humans to comprehend structural information and conduct general graph reasoning. The potential benefits and capabilities of representing graph structures as visual images (i.e., $\textit{visual graph}$) are still unexplored. To fill the gap, we innovatively propose an end-to-end framework, called $\textbf{G}$raph to v$\textbf{I}$sual and $\textbf{T}$extual Integr$\textbf{A}$tion (GITA), which firstly incorporates visual graphs into general graph reasoning. Besides, we establish $\textbf{G}$raph-based $\textbf{V}$ision-$\textbf{L}$anguage $\textbf{Q}$uestion $\textbf{A}$nswering (GVLQA) dataset from existing graph data, which is the first vision-language dataset for general graph reasoning purposes. Extensive experiments on the GVLQA dataset and five real-world datasets show that GITA outperforms mainstream LLMs in terms of general graph reasoning capabilities. Moreover, We highlight the effectiveness of the layout augmentation on visual graphs and pretraining on the GVLQA dataset.
title	GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning
topic	Computation and Language
url	https://arxiv.org/abs/2402.02130

Similar Items