Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xie, Jin, Li, Songze
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2407.03622
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916311958814720
author	Xie, Jin Li, Songze
author_facet	Xie, Jin Li, Songze
contents	Training large models requires a large amount of data, as well as abundant computation resources. While collaborative learning (e.g., federated learning) provides a promising paradigm to harness collective data from many participants, training large models remains a major challenge for participants with limited resources like mobile devices. We introduce MSfusion, an effective and efficient collaborative learning framework, tailored for training larger models on resourceconstraint machines through model splitting. Specifically, a double shifting model splitting scheme is designed such that in each training round, each participant is assigned a subset of model parameters to train over local data, and aggregates with sub-models of other peers on common parameters. While model splitting significantly reduces the computation and communication costs of individual participants, additional novel designs on adaptive model overlapping and contrastive loss functions help MSfusion to maintain training effectiveness, against model shift across participants. Extensive experiments on image and NLP tasks illustrate significant advantages of MSfusion in performance and efficiency for training large models, and its strong scalability: computation cost of each participant reduces significantly as the number of participants increases.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_03622
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	MSfusion: A Dynamic Model Splitting Approach for Resource-Constrained Machines to Collaboratively Train Larger Models Xie, Jin Li, Songze Machine Learning Training large models requires a large amount of data, as well as abundant computation resources. While collaborative learning (e.g., federated learning) provides a promising paradigm to harness collective data from many participants, training large models remains a major challenge for participants with limited resources like mobile devices. We introduce MSfusion, an effective and efficient collaborative learning framework, tailored for training larger models on resourceconstraint machines through model splitting. Specifically, a double shifting model splitting scheme is designed such that in each training round, each participant is assigned a subset of model parameters to train over local data, and aggregates with sub-models of other peers on common parameters. While model splitting significantly reduces the computation and communication costs of individual participants, additional novel designs on adaptive model overlapping and contrastive loss functions help MSfusion to maintain training effectiveness, against model shift across participants. Extensive experiments on image and NLP tasks illustrate significant advantages of MSfusion in performance and efficiency for training large models, and its strong scalability: computation cost of each participant reduces significantly as the number of participants increases.
title	MSfusion: A Dynamic Model Splitting Approach for Resource-Constrained Machines to Collaboratively Train Larger Models
topic	Machine Learning
url	https://arxiv.org/abs/2407.03622

Similar Items