Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Du, Yichao, Zhang, Zhirui, Yue, Linan, Huang, Xu, Zhang, Yuqing, Xu, Tong, Xu, Linli, Chen, Enhong
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2401.10070
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917895820279808
author	Du, Yichao Zhang, Zhirui Yue, Linan Huang, Xu Zhang, Yuqing Xu, Tong Xu, Linli Chen, Enhong
author_facet	Du, Yichao Zhang, Zhirui Yue, Linan Huang, Xu Zhang, Yuqing Xu, Tong Xu, Linli Chen, Enhong
contents	To protect privacy and meet legal regulations, federated learning (FL) has gained significant attention for training speech-to-text (S2T) systems, including automatic speech recognition (ASR) and speech translation (ST). However, the commonly used FL approach (i.e., \textsc{FedAvg}) in S2T tasks typically suffers from extensive communication overhead due to multi-round interactions based on the whole model and performance degradation caused by data heterogeneity among clients.To address these issues, we propose a personalized federated S2T framework that introduces \textsc{FedLoRA}, a lightweight LoRA module for client-side tuning and interaction with the server to minimize communication overhead, and \textsc{FedMem}, a global model equipped with a $k$-nearest-neighbor ($k$NN) classifier that captures client-specific distributional shifts to achieve personalization and overcome data heterogeneity. Extensive experiments based on Conformer and Whisper backbone models on CoVoST and GigaSpeech benchmarks show that our approach significantly reduces the communication overhead on all S2T tasks and effectively personalizes the global model to overcome data heterogeneity.
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_10070
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks Du, Yichao Zhang, Zhirui Yue, Linan Huang, Xu Zhang, Yuqing Xu, Tong Xu, Linli Chen, Enhong Computation and Language Sound Audio and Speech Processing To protect privacy and meet legal regulations, federated learning (FL) has gained significant attention for training speech-to-text (S2T) systems, including automatic speech recognition (ASR) and speech translation (ST). However, the commonly used FL approach (i.e., \textsc{FedAvg}) in S2T tasks typically suffers from extensive communication overhead due to multi-round interactions based on the whole model and performance degradation caused by data heterogeneity among clients.To address these issues, we propose a personalized federated S2T framework that introduces \textsc{FedLoRA}, a lightweight LoRA module for client-side tuning and interaction with the server to minimize communication overhead, and \textsc{FedMem}, a global model equipped with a $k$-nearest-neighbor ($k$NN) classifier that captures client-specific distributional shifts to achieve personalization and overcome data heterogeneity. Extensive experiments based on Conformer and Whisper backbone models on CoVoST and GigaSpeech benchmarks show that our approach significantly reduces the communication overhead on all S2T tasks and effectively personalizes the global model to overcome data heterogeneity.
title	Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
topic	Computation and Language Sound Audio and Speech Processing
url	https://arxiv.org/abs/2401.10070

Similar Items