Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Dingkun, Qi, Shuhan, Xiao, Xinyu, Chen, Kehai, Wang, Xuan
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2503.07663
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908605108715520
author	Zhang, Dingkun Qi, Shuhan Xiao, Xinyu Chen, Kehai Wang, Xuan
author_facet	Zhang, Dingkun Qi, Shuhan Xiao, Xinyu Chen, Kehai Wang, Xuan
contents	Recent advances in Multimodal Large Language Models (MLLMs) have enhanced their versatility as they integrate a growing number of modalities. Considering the heavy cost of training MLLMs, it is efficient to reuse the existing ones and extend them to more modalities through Modality-incremental Continual Learning (MCL). The exploration of MCL is in its early stages. In this work, we dive into the causes of performance degradation in MCL. We uncover that it suffers not only from forgetting as in traditional continual learning, but also from misalignment between the modality-agnostic and modality-specific components. To this end, we propose an elegantly simple MCL paradigm called "MErge then ReAlign" (MERA) to address both forgetting and misalignment. MERA avoids introducing heavy model budgets or modifying model architectures, hence is easy to deploy and highly reusable in the MLLM community. Extensive experiments demonstrate the impressive performance of MERA, holding an average of 99.84\% Backward Relative Gain when extending to four modalities, achieving nearly lossless MCL performance. Our findings underscore the misalignment issue in MCL. More broadly, our work showcases how to adjust different components of MLLMs during continual learning.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_07663
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Merge then Realign: Simple and Effective Modality-Incremental Continual Learning for Multimodal LLMs Zhang, Dingkun Qi, Shuhan Xiao, Xinyu Chen, Kehai Wang, Xuan Machine Learning Artificial Intelligence Recent advances in Multimodal Large Language Models (MLLMs) have enhanced their versatility as they integrate a growing number of modalities. Considering the heavy cost of training MLLMs, it is efficient to reuse the existing ones and extend them to more modalities through Modality-incremental Continual Learning (MCL). The exploration of MCL is in its early stages. In this work, we dive into the causes of performance degradation in MCL. We uncover that it suffers not only from forgetting as in traditional continual learning, but also from misalignment between the modality-agnostic and modality-specific components. To this end, we propose an elegantly simple MCL paradigm called "MErge then ReAlign" (MERA) to address both forgetting and misalignment. MERA avoids introducing heavy model budgets or modifying model architectures, hence is easy to deploy and highly reusable in the MLLM community. Extensive experiments demonstrate the impressive performance of MERA, holding an average of 99.84\% Backward Relative Gain when extending to four modalities, achieving nearly lossless MCL performance. Our findings underscore the misalignment issue in MCL. More broadly, our work showcases how to adjust different components of MLLMs during continual learning.
title	Merge then Realign: Simple and Effective Modality-Incremental Continual Learning for Multimodal LLMs
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2503.07663

Similar Items