Salvato in:
| Autore principale: | |
|---|---|
| Natura: | Preprint |
| Pubblicazione: |
2026
|
| Soggetti: | |
| Accesso online: | https://arxiv.org/abs/2601.09746 |
| Tags: |
Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
| _version_ | 1866908943484190720 |
|---|---|
| author | Xu, Philip |
| author_facet | Xu, Philip |
| contents | This paper introduces a novel Multi-Agent Cooperative Learning (MACL) framework to address cross-modal alignment collapse in vision-language models when handling out-of-distribution (OOD) concepts. Four core agents, including image, text, name, and coordination agents, collaboratively mitigate modality imbalance through structured message passing. The proposed framework enables multi-agent feature space name learning, incorporates a context exchange enhanced few-shot learning algorithm, and adopts an adaptive dynamic balancing mechanism to regulate inter-agent contributions. Experiments on the VISTA-Beyond dataset demonstrate that MACL significantly improves performance in both few-shot and zero-shot settings, achieving 1-5% precision gains across diverse visual domains. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2601_09746 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Multi-Agent Cooperative Learning for Robust Vision-Language Alignment under OOD Concepts Xu, Philip Multiagent Systems This paper introduces a novel Multi-Agent Cooperative Learning (MACL) framework to address cross-modal alignment collapse in vision-language models when handling out-of-distribution (OOD) concepts. Four core agents, including image, text, name, and coordination agents, collaboratively mitigate modality imbalance through structured message passing. The proposed framework enables multi-agent feature space name learning, incorporates a context exchange enhanced few-shot learning algorithm, and adopts an adaptive dynamic balancing mechanism to regulate inter-agent contributions. Experiments on the VISTA-Beyond dataset demonstrate that MACL significantly improves performance in both few-shot and zero-shot settings, achieving 1-5% precision gains across diverse visual domains. |
| title | Multi-Agent Cooperative Learning for Robust Vision-Language Alignment under OOD Concepts |
| topic | Multiagent Systems |
| url | https://arxiv.org/abs/2601.09746 |