Saved in:
Bibliographic Details
Main Authors: Hong, Mengze, Gu, Yi, Jiang, Di, Gu, Hanlin, Zhang, Chen Jason, Wang, Lu, Su, Zhiyang
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.04945
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917315970334720
author Hong, Mengze
Gu, Yi
Jiang, Di
Gu, Hanlin
Zhang, Chen Jason
Wang, Lu
Su, Zhiyang
author_facet Hong, Mengze
Gu, Yi
Jiang, Di
Gu, Hanlin
Zhang, Chen Jason
Wang, Lu
Su, Zhiyang
contents Training automatic speech recognition (ASR) models increasingly relies on decentralized federated learning to ensure data privacy and accessibility, producing multiple local models that require effective merging. In hybrid ASR systems, while acoustic models can be merged using established methods, the language model (LM) for rescoring the N-best speech recognition list faces challenges due to the heterogeneity of non-neural n-gram models and neural network models. This paper proposes a heterogeneous LM optimization task and introduces a match-and-merge paradigm with two algorithms: the Genetic Match-and-Merge Algorithm (GMMA), using genetic operations to evolve and pair LMs, and the Reinforced Match-and-Merge Algorithm (RMMA), leveraging reinforcement learning for efficient convergence. Experiments on seven OpenSLR datasets show RMMA achieves the lowest average Character Error Rate and better generalization than baselines, converging up to seven times faster than GMMA, highlighting the paradigm's potential for scalable, privacy-preserving ASR systems.
format Preprint
id arxiv_https___arxiv_org_abs_2603_04945
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition
Hong, Mengze
Gu, Yi
Jiang, Di
Gu, Hanlin
Zhang, Chen Jason
Wang, Lu
Su, Zhiyang
Computation and Language
Training automatic speech recognition (ASR) models increasingly relies on decentralized federated learning to ensure data privacy and accessibility, producing multiple local models that require effective merging. In hybrid ASR systems, while acoustic models can be merged using established methods, the language model (LM) for rescoring the N-best speech recognition list faces challenges due to the heterogeneity of non-neural n-gram models and neural network models. This paper proposes a heterogeneous LM optimization task and introduces a match-and-merge paradigm with two algorithms: the Genetic Match-and-Merge Algorithm (GMMA), using genetic operations to evolve and pair LMs, and the Reinforced Match-and-Merge Algorithm (RMMA), leveraging reinforcement learning for efficient convergence. Experiments on seven OpenSLR datasets show RMMA achieves the lowest average Character Error Rate and better generalization than baselines, converging up to seven times faster than GMMA, highlighting the paradigm's potential for scalable, privacy-preserving ASR systems.
title Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition
topic Computation and Language
url https://arxiv.org/abs/2603.04945