Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Yang, Jianing, Nakata, Wataru, Saito, Yuki, Saruwatari, Hiroshi
Format:	Preprint
Publié:	2026
Sujets:	Sound
Accès en ligne:	https://arxiv.org/abs/2601.13700
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866908777000730624
author	Yang, Jianing Nakata, Wataru Saito, Yuki Saruwatari, Hiroshi
author_facet	Yang, Jianing Nakata, Wataru Saito, Yuki Saruwatari, Hiroshi
contents	With the advancement of self-supervised learning (SSL), fine-tuning pretrained SSL models for mean opinion score (MOS) prediction has achieved state-of-the-art performance. However, during fine-tuning, these SSL-based MOS prediction models often suffer from catastrophic forgetting of the pretrained knowledge and tend to overfit the training set, resulting in poor generalization performance. In this study, we propose DistilMOS, a novel method that learns to predict not only MOS but also token IDs obtained by clustering the hidden representations of each layer in the pretrained SSL model. These layer-wise token targets serve as self-distillation signals that enables the MOS prediction model to extract rich internal knowledge from SSL models, enhancing both prediction accuracy and generalization capability. Experimental evaluations demonstrate that our method significantly outperforms standard SSL-based MOS prediction models on both in-domain and out-of-domain evaluations, verifying the effectiveness and practicality of the proposed method.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_13700
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	DistilMOS: Layer-Wise Self-Distillation For Self-Supervised Learning Model-Based MOS Prediction Yang, Jianing Nakata, Wataru Saito, Yuki Saruwatari, Hiroshi Sound With the advancement of self-supervised learning (SSL), fine-tuning pretrained SSL models for mean opinion score (MOS) prediction has achieved state-of-the-art performance. However, during fine-tuning, these SSL-based MOS prediction models often suffer from catastrophic forgetting of the pretrained knowledge and tend to overfit the training set, resulting in poor generalization performance. In this study, we propose DistilMOS, a novel method that learns to predict not only MOS but also token IDs obtained by clustering the hidden representations of each layer in the pretrained SSL model. These layer-wise token targets serve as self-distillation signals that enables the MOS prediction model to extract rich internal knowledge from SSL models, enhancing both prediction accuracy and generalization capability. Experimental evaluations demonstrate that our method significantly outperforms standard SSL-based MOS prediction models on both in-domain and out-of-domain evaluations, verifying the effectiveness and practicality of the proposed method.
title	DistilMOS: Layer-Wise Self-Distillation For Self-Supervised Learning Model-Based MOS Prediction
topic	Sound
url	https://arxiv.org/abs/2601.13700

Documents similaires