:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ko, Myeongjin, Choi, Yong-Hoon
Format:	Preprint
Published:	2023
Subjects:	Sound Machine Learning Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2308.01573
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization
by: Lee, Sang-Hoon, et al.
Published: (2024)

Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification
by: Liao, Yen-Lun, et al.
Published: (2022)

Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator
by: Kaneko, Takuhiro, et al.
Published: (2024)

Score-Based Training for Energy-Based TTS Models
by: Sun, Wanli, et al.
Published: (2025)

RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer
by: Hong, Seongho, et al.
Published: (2025)

Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting
by: Park, Hyun Jin, et al.
Published: (2024)

EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge
by: Manku, Ruskin Raj, et al.
Published: (2025)

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
by: Lee, Sang-Hoon, et al.
Published: (2024)

Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
by: Arora, Akshit, et al.
Published: (2024)

Are Deep Speech Denoising Models Robust to Adversarial Noise?
by: Schwarzer, Will, et al.
Published: (2025)

Training Universal Vocoders with Feature Smoothing-Based Augmentation Methods for High-Quality TTS Systems
by: Liu, Jeongmin, et al.
Published: (2024)

Adversarial Data Augmentation for Robust Speaker Verification
by: Zhou, Zhenyu, et al.
Published: (2024)

Additive Margin in Contrastive Self-Supervised Frameworks to Learn Discriminative Speaker Representations
by: Lepage, Theo, et al.
Published: (2024)

Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training
by: Melechovsky, Jan, et al.
Published: (2024)

DDTSE: Discriminative Diffusion Model for Target Speech Extraction
by: Zhang, Leying, et al.
Published: (2023)

Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning
by: Wu, Haibin, et al.
Published: (2021)

On the Generation and Removal of Speaker Adversarial Perturbation for Voice-Privacy Protection
by: Guo, Chenyang, et al.
Published: (2024)

EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
by: Liang, Ziqi, et al.
Published: (2024)

Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
by: Jeon, Yejin, et al.
Published: (2024)

SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System
by: Kim, Hyeongju, et al.
Published: (2025)

Post-Training Embedding Alignment for Decoupling Enrollment and Runtime Speaker Recognition Models
by: Gao, Chenyang, et al.
Published: (2024)

LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification
by: Chen, Xing, et al.
Published: (2022)

Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs
by: He, Xinlu, et al.
Published: (2025)

Multi-modal Adversarial Training for Zero-Shot Voice Cloning
by: Janiczek, John, et al.
Published: (2024)

Compact Neural TTS Voices for Accessibility
by: Jain, Kunal, et al.
Published: (2025)

MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2025)

Multi-Stage Speaker Diarization for Noisy Classrooms
by: Khan, Ali Sartaz, et al.
Published: (2025)

Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model
by: Park, Hyun Jin, et al.
Published: (2024)

Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering
by: Wang, Quan, et al.
Published: (2022)

DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance
by: Yang, Jinhyeok, et al.
Published: (2024)

BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing
by: Kawamura, Masaya, et al.
Published: (2025)

Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment
by: Choi, Jeongsoo, et al.
Published: (2025)

Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials
by: Akram, Ali, et al.
Published: (2024)

Spectron: Target Speaker Extraction using Conditional Transformer with Adversarial Refinement
by: Bandyopadhyay, Tathagata
Published: (2024)

Language Modelling for Speaker Diarization in Telephonic Interviews
by: India, Miquel, et al.
Published: (2025)

Unsupervised Speaker Diarization in Distributed IoT Networks Using Federated Learning
by: Bhuyan, Amit Kumar, et al.
Published: (2024)

HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System
by: Zhang, Zhisheng, et al.
Published: (2024)

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
by: Kawamura, Masaya, et al.
Published: (2024)

UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
by: Glazer, Neta, et al.
Published: (2025)

Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition
by: Ravenscroft, William, et al.
Published: (2024)