Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Xu, Jingjing, Zhou, Wei, Yang, Zijian, Beck, Eugen, Schlueter, Ralf
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Audio and Speech Processing Computation and Language Machine Learning
Online-Zugang:	https://arxiv.org/abs/2407.18930
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866910544568516608
author	Xu, Jingjing Zhou, Wei Yang, Zijian Beck, Eugen Schlueter, Ralf
author_facet	Xu, Jingjing Zhou, Wei Yang, Zijian Beck, Eugen Schlueter, Ralf
contents	Varying-size models are often required to deploy ASR systems under different hardware and/or application constraints such as memory and latency. To avoid redundant training and optimization efforts for individual models of different sizes, we present the dynamic encoder size approach, which jointly trains multiple performant models within one supernet from scratch. These subnets of various sizes are layer-wise pruned from the supernet, and thus, enjoy full parameter sharing. By combining score-based pruning with supernet training, we propose two novel methods, Simple-Top-k and Iterative-Zero-Out, to automatically select the best-performing subnets in a data-driven manner, avoiding resource-intensive search efforts. Our experiments using CTC on both Librispeech and TED-LIUM-v2 corpora show that our methods can achieve on-par performance as individually trained models of each size category. Also, our approach consistently brings small performance improvements for the full-size supernet.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_18930
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech Recognition Xu, Jingjing Zhou, Wei Yang, Zijian Beck, Eugen Schlueter, Ralf Audio and Speech Processing Computation and Language Machine Learning Varying-size models are often required to deploy ASR systems under different hardware and/or application constraints such as memory and latency. To avoid redundant training and optimization efforts for individual models of different sizes, we present the dynamic encoder size approach, which jointly trains multiple performant models within one supernet from scratch. These subnets of various sizes are layer-wise pruned from the supernet, and thus, enjoy full parameter sharing. By combining score-based pruning with supernet training, we propose two novel methods, Simple-Top-k and Iterative-Zero-Out, to automatically select the best-performing subnets in a data-driven manner, avoiding resource-intensive search efforts. Our experiments using CTC on both Librispeech and TED-LIUM-v2 corpora show that our methods can achieve on-par performance as individually trained models of each size category. Also, our approach consistently brings small performance improvements for the full-size supernet.
title	Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech Recognition
topic	Audio and Speech Processing Computation and Language Machine Learning
url	https://arxiv.org/abs/2407.18930

Ähnliche Einträge