Saved in:
Bibliographic Details
Main Authors: Xue, Leyan, Zhang, Changqing, Xue, Kecheng, Liu, Xiaohong, Wang, Guangyu, Han, Zongbo
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.06452
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918485559345152
author Xue, Leyan
Zhang, Changqing
Xue, Kecheng
Liu, Xiaohong
Wang, Guangyu
Han, Zongbo
author_facet Xue, Leyan
Zhang, Changqing
Xue, Kecheng
Liu, Xiaohong
Wang, Guangyu
Han, Zongbo
contents Although multimodal fusion has made significant progress, its advancement is severely hindered by the lack of adequate evaluation benchmarks. Current fusion methods are typically evaluated on a small selection of public datasets, a limited scope that inadequately represents the complexity and diversity of real-world scenarios, potentially leading to biased evaluations. This issue presents a twofold challenge. On one hand, models may overfit to the biases of specific datasets, hindering their generalization to broader practical applications. On the other hand, the absence of a unified evaluation standard makes fair and objective comparisons between different fusion methods difficult. Consequently, a truly universal and high-performance fusion model has yet to emerge. To address these challenges, we have developed a large-scale, domain-adaptive benchmark for multimodal evaluation. This benchmark integrates over 30 datasets, encompassing 15 modalities and 20 predictive tasks across key application domains. To complement this, we have also developed an open-source, unified, and automated evaluation pipeline that includes standardized implementations of state-of-the-art models and diverse fusion paradigms. Leveraging this platform, we have conducted large-scale experiments, successfully establishing new performance baselines across multiple tasks. This work provides the academic community with a crucial platform for rigorous and reproducible assessment of multimodal models, aiming to propel the field of multimodal artificial intelligence to new heights.
format Preprint
id arxiv_https___arxiv_org_abs_2511_06452
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains
Xue, Leyan
Zhang, Changqing
Xue, Kecheng
Liu, Xiaohong
Wang, Guangyu
Han, Zongbo
Machine Learning
Although multimodal fusion has made significant progress, its advancement is severely hindered by the lack of adequate evaluation benchmarks. Current fusion methods are typically evaluated on a small selection of public datasets, a limited scope that inadequately represents the complexity and diversity of real-world scenarios, potentially leading to biased evaluations. This issue presents a twofold challenge. On one hand, models may overfit to the biases of specific datasets, hindering their generalization to broader practical applications. On the other hand, the absence of a unified evaluation standard makes fair and objective comparisons between different fusion methods difficult. Consequently, a truly universal and high-performance fusion model has yet to emerge. To address these challenges, we have developed a large-scale, domain-adaptive benchmark for multimodal evaluation. This benchmark integrates over 30 datasets, encompassing 15 modalities and 20 predictive tasks across key application domains. To complement this, we have also developed an open-source, unified, and automated evaluation pipeline that includes standardized implementations of state-of-the-art models and diverse fusion paradigms. Leveraging this platform, we have conducted large-scale experiments, successfully establishing new performance baselines across multiple tasks. This work provides the academic community with a crucial platform for rigorous and reproducible assessment of multimodal models, aiming to propel the field of multimodal artificial intelligence to new heights.
title MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains
topic Machine Learning
url https://arxiv.org/abs/2511.06452