Saved in:
Bibliographic Details
Main Authors: Liang, Ziqi, Jia, Zhijun, Liu, Chang, Yang, Minghui, Lu, Zhihong, Wang, Jian
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.12701
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912902510804992
author Liang, Ziqi
Jia, Zhijun
Liu, Chang
Yang, Minghui
Lu, Zhihong
Wang, Jian
author_facet Liang, Ziqi
Jia, Zhijun
Liu, Chang
Yang, Minghui
Lu, Zhihong
Wang, Jian
contents Previous speech restoration (SR) primarily focuses on single-task speech restoration (SSR), which cannot address general speech restoration problems. Training specific SSR models for different distortions is time-consuming and lacks generality. In addition, most studies ignore the problem of model generalization across unseen domains. To overcome those limitations, we propose DisSR, a Disentangling Speech Representation based general speech restoration model with two properties: 1) Degradation-prior guidance, which extracts speaker-invariant degradation representation to guide the diffusion-based speech restoration model. 2) Domain adaptation, where we design cross-domain alignment training to enhance the model's adaptability and generalization on cross-domain data, respectively. Experimental results demonstrate that our method can produce high-quality restored speech under various distortion conditions. Audio samples can be found at https://itspsp.github.io/DisSR.
format Preprint
id arxiv_https___arxiv_org_abs_2602_12701
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle DisSR: Disentangling Speech Representation for Degradation-Prior Guided Cross-Domain Speech Restoration
Liang, Ziqi
Jia, Zhijun
Liu, Chang
Yang, Minghui
Lu, Zhihong
Wang, Jian
Sound
Previous speech restoration (SR) primarily focuses on single-task speech restoration (SSR), which cannot address general speech restoration problems. Training specific SSR models for different distortions is time-consuming and lacks generality. In addition, most studies ignore the problem of model generalization across unseen domains. To overcome those limitations, we propose DisSR, a Disentangling Speech Representation based general speech restoration model with two properties: 1) Degradation-prior guidance, which extracts speaker-invariant degradation representation to guide the diffusion-based speech restoration model. 2) Domain adaptation, where we design cross-domain alignment training to enhance the model's adaptability and generalization on cross-domain data, respectively. Experimental results demonstrate that our method can produce high-quality restored speech under various distortion conditions. Audio samples can be found at https://itspsp.github.io/DisSR.
title DisSR: Disentangling Speech Representation for Degradation-Prior Guided Cross-Domain Speech Restoration
topic Sound
url https://arxiv.org/abs/2602.12701