Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Mitra, Arkajyoti, Anjum, Afia, Agbaje, Paul, Pesé, Mert, Olufowobi, Habeeb
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.17088
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914143186976768
author	Mitra, Arkajyoti Anjum, Afia Agbaje, Paul Pesé, Mert Olufowobi, Habeeb
author_facet	Mitra, Arkajyoti Anjum, Afia Agbaje, Paul Pesé, Mert Olufowobi, Habeeb
contents	Vision-language models (VLMs) demonstrate impressive zero-shot and few-shot learning capabilities, making them essential for several downstream tasks. However, fine-tuning these models at scale remains challenging, particularly in federated environments where data is decentralized and non-iid across clients. Existing parameter-efficient tuning methods like LoRA (Low-Rank Adaptation) reduce computational overhead but struggle with heterogeneous client data, leading to suboptimal generalization. To address these challenges, we propose FedVLM, a federated LoRA fine-tuning framework that enables decentralized adaptation of VLMs while preserving model privacy and reducing reliance on centralized training. To further tackle data heterogeneity, we introduce personalized LoRA (pLoRA), which dynamically adapts LoRA parameters to each client's unique data distribution, significantly improving local adaptation while maintaining global model aggregation. Experiments on the RLAIF-V dataset show that pLoRA improves client-specific performance by 24.5% over standard LoRA, demonstrating superior adaptation in non-iid settings. FedVLM provides a scalable and efficient solution for fine-tuning VLMs in federated settings, advancing personalized adaptation in distributed learning scenarios.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_17088
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	FedVLM: Scalable Personalized Vision-Language Models through Federated Learning Mitra, Arkajyoti Anjum, Afia Agbaje, Paul Pesé, Mert Olufowobi, Habeeb Computer Vision and Pattern Recognition Vision-language models (VLMs) demonstrate impressive zero-shot and few-shot learning capabilities, making them essential for several downstream tasks. However, fine-tuning these models at scale remains challenging, particularly in federated environments where data is decentralized and non-iid across clients. Existing parameter-efficient tuning methods like LoRA (Low-Rank Adaptation) reduce computational overhead but struggle with heterogeneous client data, leading to suboptimal generalization. To address these challenges, we propose FedVLM, a federated LoRA fine-tuning framework that enables decentralized adaptation of VLMs while preserving model privacy and reducing reliance on centralized training. To further tackle data heterogeneity, we introduce personalized LoRA (pLoRA), which dynamically adapts LoRA parameters to each client's unique data distribution, significantly improving local adaptation while maintaining global model aggregation. Experiments on the RLAIF-V dataset show that pLoRA improves client-specific performance by 24.5% over standard LoRA, demonstrating superior adaptation in non-iid settings. FedVLM provides a scalable and efficient solution for fine-tuning VLMs in federated settings, advancing personalized adaptation in distributed learning scenarios.
title	FedVLM: Scalable Personalized Vision-Language Models through Federated Learning
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2507.17088

Similar Items