Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.12701 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866912902510804992 |
|---|---|
| author | Liang, Ziqi Jia, Zhijun Liu, Chang Yang, Minghui Lu, Zhihong Wang, Jian |
| author_facet | Liang, Ziqi Jia, Zhijun Liu, Chang Yang, Minghui Lu, Zhihong Wang, Jian |
| contents | Previous speech restoration (SR) primarily focuses on single-task speech restoration (SSR), which cannot address general speech restoration problems. Training specific SSR models for different distortions is time-consuming and lacks generality. In addition, most studies ignore the problem of model generalization across unseen domains. To overcome those limitations, we propose DisSR, a Disentangling Speech Representation based general speech restoration model with two properties: 1) Degradation-prior guidance, which extracts speaker-invariant degradation representation to guide the diffusion-based speech restoration model. 2) Domain adaptation, where we design cross-domain alignment training to enhance the model's adaptability and generalization on cross-domain data, respectively. Experimental results demonstrate that our method can produce high-quality restored speech under various distortion conditions. Audio samples can be found at https://itspsp.github.io/DisSR. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2602_12701 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | DisSR: Disentangling Speech Representation for Degradation-Prior Guided Cross-Domain Speech Restoration Liang, Ziqi Jia, Zhijun Liu, Chang Yang, Minghui Lu, Zhihong Wang, Jian Sound Previous speech restoration (SR) primarily focuses on single-task speech restoration (SSR), which cannot address general speech restoration problems. Training specific SSR models for different distortions is time-consuming and lacks generality. In addition, most studies ignore the problem of model generalization across unseen domains. To overcome those limitations, we propose DisSR, a Disentangling Speech Representation based general speech restoration model with two properties: 1) Degradation-prior guidance, which extracts speaker-invariant degradation representation to guide the diffusion-based speech restoration model. 2) Domain adaptation, where we design cross-domain alignment training to enhance the model's adaptability and generalization on cross-domain data, respectively. Experimental results demonstrate that our method can produce high-quality restored speech under various distortion conditions. Audio samples can be found at https://itspsp.github.io/DisSR. |
| title | DisSR: Disentangling Speech Representation for Degradation-Prior Guided Cross-Domain Speech Restoration |
| topic | Sound |
| url | https://arxiv.org/abs/2602.12701 |