Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zampierin, Luca, Hacene, Ghouthi Boukli, Nguyen, Bac, Ravanelli, Mirco
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Computation and Language Machine Learning Sound
Online Access:	https://arxiv.org/abs/2402.16830
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Self-supervised learning (SSL) has achieved remarkable success across various speech-processing tasks. To enhance its efficiency, previous works often leverage the use of compression techniques. A notable recent attempt is DPHuBERT, which applies joint knowledge distillation (KD) and structured pruning to learn a significantly smaller SSL model. In this paper, we contribute to this research domain by introducing SKILL, a novel method that conducts distillation across groups of layers instead of distilling individual arbitrarily selected layers within the teacher network. The identification of the layers to distill is achieved through a hierarchical clustering procedure applied to layer similarity measures. Extensive experiments demonstrate that our distilled version of WavLM Base+ not only outperforms DPHuBERT but also achieves state-of-the-art results in the 30M parameters model class across several SUPERB tasks.

Similar Items