Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Ze, Shi, Yao, Xu, Yunfei, Li, Ming
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2410.04017
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912627717832704
author	Li, Ze Shi, Yao Xu, Yunfei Li, Ming
author_facet	Li, Ze Shi, Yao Xu, Yunfei Li, Ming
contents	Speaker embedding based zero-shot Text-to-Speech (TTS) systems enable high-quality speech synthesis for unseen speakers using minimal data. However, these systems are vulnerable to adversarial attacks, where an attacker introduces imperceptible perturbations to the original speaker's audio waveform, leading to synthesized speech sounds like another person. This vulnerability poses significant security risks, including speaker identity spoofing and unauthorized voice manipulation. This paper investigates two primary defense strategies to address these threats: adversarial training and adversarial purification. Adversarial training enhances the model's robustness by integrating adversarial examples during the training process, thereby improving resistance to such attacks. Adversarial purification, on the other hand, employs diffusion probabilistic models to revert adversarially perturbed audio to its clean form. Experimental results demonstrate that these defense mechanisms can significantly reduce the impact of adversarial perturbations, enhancing the security and reliability of speaker embedding based zero-shot TTS systems in adversarial environments.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_04017
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System Li, Ze Shi, Yao Xu, Yunfei Li, Ming Audio and Speech Processing Speaker embedding based zero-shot Text-to-Speech (TTS) systems enable high-quality speech synthesis for unseen speakers using minimal data. However, these systems are vulnerable to adversarial attacks, where an attacker introduces imperceptible perturbations to the original speaker's audio waveform, leading to synthesized speech sounds like another person. This vulnerability poses significant security risks, including speaker identity spoofing and unauthorized voice manipulation. This paper investigates two primary defense strategies to address these threats: adversarial training and adversarial purification. Adversarial training enhances the model's robustness by integrating adversarial examples during the training process, thereby improving resistance to such attacks. Adversarial purification, on the other hand, employs diffusion probabilistic models to revert adversarially perturbed audio to its clean form. Experimental results demonstrate that these defense mechanisms can significantly reduce the impact of adversarial perturbations, enhancing the security and reliability of speaker embedding based zero-shot TTS systems in adversarial environments.
title	Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System
topic	Audio and Speech Processing
url	https://arxiv.org/abs/2410.04017

Similar Items