Saved in:
Bibliographic Details
Main Authors: Mitra, Arkajyoti, Anjum, Afia, Agbaje, Paul, Pesé, Mert, Olufowobi, Habeeb
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.17088
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914143186976768
author Mitra, Arkajyoti
Anjum, Afia
Agbaje, Paul
Pesé, Mert
Olufowobi, Habeeb
author_facet Mitra, Arkajyoti
Anjum, Afia
Agbaje, Paul
Pesé, Mert
Olufowobi, Habeeb
contents Vision-language models (VLMs) demonstrate impressive zero-shot and few-shot learning capabilities, making them essential for several downstream tasks. However, fine-tuning these models at scale remains challenging, particularly in federated environments where data is decentralized and non-iid across clients. Existing parameter-efficient tuning methods like LoRA (Low-Rank Adaptation) reduce computational overhead but struggle with heterogeneous client data, leading to suboptimal generalization. To address these challenges, we propose FedVLM, a federated LoRA fine-tuning framework that enables decentralized adaptation of VLMs while preserving model privacy and reducing reliance on centralized training. To further tackle data heterogeneity, we introduce personalized LoRA (pLoRA), which dynamically adapts LoRA parameters to each client's unique data distribution, significantly improving local adaptation while maintaining global model aggregation. Experiments on the RLAIF-V dataset show that pLoRA improves client-specific performance by 24.5% over standard LoRA, demonstrating superior adaptation in non-iid settings. FedVLM provides a scalable and efficient solution for fine-tuning VLMs in federated settings, advancing personalized adaptation in distributed learning scenarios.
format Preprint
id arxiv_https___arxiv_org_abs_2507_17088
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle FedVLM: Scalable Personalized Vision-Language Models through Federated Learning
Mitra, Arkajyoti
Anjum, Afia
Agbaje, Paul
Pesé, Mert
Olufowobi, Habeeb
Computer Vision and Pattern Recognition
Vision-language models (VLMs) demonstrate impressive zero-shot and few-shot learning capabilities, making them essential for several downstream tasks. However, fine-tuning these models at scale remains challenging, particularly in federated environments where data is decentralized and non-iid across clients. Existing parameter-efficient tuning methods like LoRA (Low-Rank Adaptation) reduce computational overhead but struggle with heterogeneous client data, leading to suboptimal generalization. To address these challenges, we propose FedVLM, a federated LoRA fine-tuning framework that enables decentralized adaptation of VLMs while preserving model privacy and reducing reliance on centralized training. To further tackle data heterogeneity, we introduce personalized LoRA (pLoRA), which dynamically adapts LoRA parameters to each client's unique data distribution, significantly improving local adaptation while maintaining global model aggregation. Experiments on the RLAIF-V dataset show that pLoRA improves client-specific performance by 24.5% over standard LoRA, demonstrating superior adaptation in non-iid settings. FedVLM provides a scalable and efficient solution for fine-tuning VLMs in federated settings, advancing personalized adaptation in distributed learning scenarios.
title FedVLM: Scalable Personalized Vision-Language Models through Federated Learning
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2507.17088