Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Su, Daohan, Liu, Hao, Li, Xunkai, Zhu, Yinlin, Yongfu, Xiong, Liu, Yi, Qin, Hongchao, Li, Rong-Hua, Wang, Guoren
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.11468
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913115666382848
author	Su, Daohan Liu, Hao Li, Xunkai Zhu, Yinlin Yongfu, Xiong Liu, Yi Qin, Hongchao Li, Rong-Hua Wang, Guoren
author_facet	Su, Daohan Liu, Hao Li, Xunkai Zhu, Yinlin Yongfu, Xiong Liu, Yi Qin, Hongchao Li, Rong-Hua Wang, Guoren
contents	Multimodal Graph Neural Networks (MGNNs) have shown strong potential for learning from multimodal attributed graphs, yet most existing approaches rely on tightly coupled architectures that suffer from prohibitive computational overhead. In this paper, we present a systematic empirical analysis showing that decoupled MGNNs are substantially more efficient and scalable for large-scale graph learning. However, we identify a critical bottleneck in existing decoupled pipelines, namely modal conflict, which arises in both the propagation and aggregation stages. Specifically, independent multi-hop diffusion causes cross-modal semantic divergence during propagation, while naive fusion fails to align multi-hop feature trajectories during aggregation, jointly limiting effective representation learning. To address this challenge, we propose CAMPA, a Cross-modal Aligned Multimodal Propagation & Aggregation framework for decoupled multimodal graph learning. Concretely, CAMPA introduces a two-stage alignment mechanism: (1) cross-modal aligned propagation, which injects cross-modal similarity priors into message passing to preserve semantic consistency without additional parameter overhead; (2) trajectory aligned aggregation, which leverages trajectory-level self-attention and cross-attention to capture and align long-range dependencies across modalities and hops. Extensive experiments on diverse benchmark datasets and tasks demonstrate that CAMPA consistently outperforms strong coupled and decoupled baselines while preserving the efficiency advantages of the decoupled paradigm.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_11468
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	CAMPA: Efficient and Aligned Multimodal Graph Learning via Decoupled Propagation and Aggregation Su, Daohan Liu, Hao Li, Xunkai Zhu, Yinlin Yongfu, Xiong Liu, Yi Qin, Hongchao Li, Rong-Hua Wang, Guoren Artificial Intelligence Multimodal Graph Neural Networks (MGNNs) have shown strong potential for learning from multimodal attributed graphs, yet most existing approaches rely on tightly coupled architectures that suffer from prohibitive computational overhead. In this paper, we present a systematic empirical analysis showing that decoupled MGNNs are substantially more efficient and scalable for large-scale graph learning. However, we identify a critical bottleneck in existing decoupled pipelines, namely modal conflict, which arises in both the propagation and aggregation stages. Specifically, independent multi-hop diffusion causes cross-modal semantic divergence during propagation, while naive fusion fails to align multi-hop feature trajectories during aggregation, jointly limiting effective representation learning. To address this challenge, we propose CAMPA, a Cross-modal Aligned Multimodal Propagation & Aggregation framework for decoupled multimodal graph learning. Concretely, CAMPA introduces a two-stage alignment mechanism: (1) cross-modal aligned propagation, which injects cross-modal similarity priors into message passing to preserve semantic consistency without additional parameter overhead; (2) trajectory aligned aggregation, which leverages trajectory-level self-attention and cross-attention to capture and align long-range dependencies across modalities and hops. Extensive experiments on diverse benchmark datasets and tasks demonstrate that CAMPA consistently outperforms strong coupled and decoupled baselines while preserving the efficiency advantages of the decoupled paradigm.
title	CAMPA: Efficient and Aligned Multimodal Graph Learning via Decoupled Propagation and Aggregation
topic	Artificial Intelligence
url	https://arxiv.org/abs/2605.11468

Similar Items