Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hong, Mengze, Gu, Yi, Jiang, Di, Gu, Hanlin, Zhang, Chen Jason, Wang, Lu, Su, Zhiyang
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2603.04945
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917315970334720
author	Hong, Mengze Gu, Yi Jiang, Di Gu, Hanlin Zhang, Chen Jason Wang, Lu Su, Zhiyang
author_facet	Hong, Mengze Gu, Yi Jiang, Di Gu, Hanlin Zhang, Chen Jason Wang, Lu Su, Zhiyang
contents	Training automatic speech recognition (ASR) models increasingly relies on decentralized federated learning to ensure data privacy and accessibility, producing multiple local models that require effective merging. In hybrid ASR systems, while acoustic models can be merged using established methods, the language model (LM) for rescoring the N-best speech recognition list faces challenges due to the heterogeneity of non-neural n-gram models and neural network models. This paper proposes a heterogeneous LM optimization task and introduces a match-and-merge paradigm with two algorithms: the Genetic Match-and-Merge Algorithm (GMMA), using genetic operations to evolve and pair LMs, and the Reinforced Match-and-Merge Algorithm (RMMA), leveraging reinforcement learning for efficient convergence. Experiments on seven OpenSLR datasets show RMMA achieves the lowest average Character Error Rate and better generalization than baselines, converging up to seven times faster than GMMA, highlighting the paradigm's potential for scalable, privacy-preserving ASR systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_04945
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition Hong, Mengze Gu, Yi Jiang, Di Gu, Hanlin Zhang, Chen Jason Wang, Lu Su, Zhiyang Computation and Language Training automatic speech recognition (ASR) models increasingly relies on decentralized federated learning to ensure data privacy and accessibility, producing multiple local models that require effective merging. In hybrid ASR systems, while acoustic models can be merged using established methods, the language model (LM) for rescoring the N-best speech recognition list faces challenges due to the heterogeneity of non-neural n-gram models and neural network models. This paper proposes a heterogeneous LM optimization task and introduces a match-and-merge paradigm with two algorithms: the Genetic Match-and-Merge Algorithm (GMMA), using genetic operations to evolve and pair LMs, and the Reinforced Match-and-Merge Algorithm (RMMA), leveraging reinforcement learning for efficient convergence. Experiments on seven OpenSLR datasets show RMMA achieves the lowest average Character Error Rate and better generalization than baselines, converging up to seven times faster than GMMA, highlighting the paradigm's potential for scalable, privacy-preserving ASR systems.
title	Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition
topic	Computation and Language
url	https://arxiv.org/abs/2603.04945

Similar Items