Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Xu, Jingjing, Zhou, Wei, Yang, Zijian, Beck, Eugen, Schlueter, Ralf
Format: Preprint
Veröffentlicht: 2024
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2407.18930
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866910544568516608
author Xu, Jingjing
Zhou, Wei
Yang, Zijian
Beck, Eugen
Schlueter, Ralf
author_facet Xu, Jingjing
Zhou, Wei
Yang, Zijian
Beck, Eugen
Schlueter, Ralf
contents Varying-size models are often required to deploy ASR systems under different hardware and/or application constraints such as memory and latency. To avoid redundant training and optimization efforts for individual models of different sizes, we present the dynamic encoder size approach, which jointly trains multiple performant models within one supernet from scratch. These subnets of various sizes are layer-wise pruned from the supernet, and thus, enjoy full parameter sharing. By combining score-based pruning with supernet training, we propose two novel methods, Simple-Top-k and Iterative-Zero-Out, to automatically select the best-performing subnets in a data-driven manner, avoiding resource-intensive search efforts. Our experiments using CTC on both Librispeech and TED-LIUM-v2 corpora show that our methods can achieve on-par performance as individually trained models of each size category. Also, our approach consistently brings small performance improvements for the full-size supernet.
format Preprint
id arxiv_https___arxiv_org_abs_2407_18930
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech Recognition
Xu, Jingjing
Zhou, Wei
Yang, Zijian
Beck, Eugen
Schlueter, Ralf
Audio and Speech Processing
Computation and Language
Machine Learning
Varying-size models are often required to deploy ASR systems under different hardware and/or application constraints such as memory and latency. To avoid redundant training and optimization efforts for individual models of different sizes, we present the dynamic encoder size approach, which jointly trains multiple performant models within one supernet from scratch. These subnets of various sizes are layer-wise pruned from the supernet, and thus, enjoy full parameter sharing. By combining score-based pruning with supernet training, we propose two novel methods, Simple-Top-k and Iterative-Zero-Out, to automatically select the best-performing subnets in a data-driven manner, avoiding resource-intensive search efforts. Our experiments using CTC on both Librispeech and TED-LIUM-v2 corpora show that our methods can achieve on-par performance as individually trained models of each size category. Also, our approach consistently brings small performance improvements for the full-size supernet.
title Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech Recognition
topic Audio and Speech Processing
Computation and Language
Machine Learning
url https://arxiv.org/abs/2407.18930