Saved in:
| Main Authors: | Ko, Myeongjin, Choi, Yong-Hoon |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2308.01573 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization
by: Lee, Sang-Hoon, et al.
Published: (2024)
by: Lee, Sang-Hoon, et al.
Published: (2024)
Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification
by: Liao, Yen-Lun, et al.
Published: (2022)
by: Liao, Yen-Lun, et al.
Published: (2022)
Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator
by: Kaneko, Takuhiro, et al.
Published: (2024)
by: Kaneko, Takuhiro, et al.
Published: (2024)
Score-Based Training for Energy-Based TTS Models
by: Sun, Wanli, et al.
Published: (2025)
by: Sun, Wanli, et al.
Published: (2025)
RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer
by: Hong, Seongho, et al.
Published: (2025)
by: Hong, Seongho, et al.
Published: (2025)
Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting
by: Park, Hyun Jin, et al.
Published: (2024)
by: Park, Hyun Jin, et al.
Published: (2024)
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge
by: Manku, Ruskin Raj, et al.
Published: (2025)
by: Manku, Ruskin Raj, et al.
Published: (2025)
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
by: Lee, Sang-Hoon, et al.
Published: (2024)
by: Lee, Sang-Hoon, et al.
Published: (2024)
Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
by: Arora, Akshit, et al.
Published: (2024)
by: Arora, Akshit, et al.
Published: (2024)
Are Deep Speech Denoising Models Robust to Adversarial Noise?
by: Schwarzer, Will, et al.
Published: (2025)
by: Schwarzer, Will, et al.
Published: (2025)
Training Universal Vocoders with Feature Smoothing-Based Augmentation Methods for High-Quality TTS Systems
by: Liu, Jeongmin, et al.
Published: (2024)
by: Liu, Jeongmin, et al.
Published: (2024)
Adversarial Data Augmentation for Robust Speaker Verification
by: Zhou, Zhenyu, et al.
Published: (2024)
by: Zhou, Zhenyu, et al.
Published: (2024)
Additive Margin in Contrastive Self-Supervised Frameworks to Learn Discriminative Speaker Representations
by: Lepage, Theo, et al.
Published: (2024)
by: Lepage, Theo, et al.
Published: (2024)
Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training
by: Melechovsky, Jan, et al.
Published: (2024)
by: Melechovsky, Jan, et al.
Published: (2024)
DDTSE: Discriminative Diffusion Model for Target Speech Extraction
by: Zhang, Leying, et al.
Published: (2023)
by: Zhang, Leying, et al.
Published: (2023)
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning
by: Wu, Haibin, et al.
Published: (2021)
by: Wu, Haibin, et al.
Published: (2021)
On the Generation and Removal of Speaker Adversarial Perturbation for Voice-Privacy Protection
by: Guo, Chenyang, et al.
Published: (2024)
by: Guo, Chenyang, et al.
Published: (2024)
EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
by: Liang, Ziqi, et al.
Published: (2024)
by: Liang, Ziqi, et al.
Published: (2024)
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
by: Jeon, Yejin, et al.
Published: (2024)
by: Jeon, Yejin, et al.
Published: (2024)
SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System
by: Kim, Hyeongju, et al.
Published: (2025)
by: Kim, Hyeongju, et al.
Published: (2025)
Post-Training Embedding Alignment for Decoupling Enrollment and Runtime Speaker Recognition Models
by: Gao, Chenyang, et al.
Published: (2024)
by: Gao, Chenyang, et al.
Published: (2024)
LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification
by: Chen, Xing, et al.
Published: (2022)
by: Chen, Xing, et al.
Published: (2022)
Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs
by: He, Xinlu, et al.
Published: (2025)
by: He, Xinlu, et al.
Published: (2025)
Multi-modal Adversarial Training for Zero-Shot Voice Cloning
by: Janiczek, John, et al.
Published: (2024)
by: Janiczek, John, et al.
Published: (2024)
Compact Neural TTS Voices for Accessibility
by: Jain, Kunal, et al.
Published: (2025)
by: Jain, Kunal, et al.
Published: (2025)
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2025)
by: Jiang, Ziyue, et al.
Published: (2025)
Multi-Stage Speaker Diarization for Noisy Classrooms
by: Khan, Ali Sartaz, et al.
Published: (2025)
by: Khan, Ali Sartaz, et al.
Published: (2025)
Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model
by: Park, Hyun Jin, et al.
Published: (2024)
by: Park, Hyun Jin, et al.
Published: (2024)
Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering
by: Wang, Quan, et al.
Published: (2022)
by: Wang, Quan, et al.
Published: (2022)
DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance
by: Yang, Jinhyeok, et al.
Published: (2024)
by: Yang, Jinhyeok, et al.
Published: (2024)
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing
by: Kawamura, Masaya, et al.
Published: (2025)
by: Kawamura, Masaya, et al.
Published: (2025)
Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment
by: Choi, Jeongsoo, et al.
Published: (2025)
by: Choi, Jeongsoo, et al.
Published: (2025)
Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials
by: Akram, Ali, et al.
Published: (2024)
by: Akram, Ali, et al.
Published: (2024)
Spectron: Target Speaker Extraction using Conditional Transformer with Adversarial Refinement
by: Bandyopadhyay, Tathagata
Published: (2024)
by: Bandyopadhyay, Tathagata
Published: (2024)
Language Modelling for Speaker Diarization in Telephonic Interviews
by: India, Miquel, et al.
Published: (2025)
by: India, Miquel, et al.
Published: (2025)
Unsupervised Speaker Diarization in Distributed IoT Networks Using Federated Learning
by: Bhuyan, Amit Kumar, et al.
Published: (2024)
by: Bhuyan, Amit Kumar, et al.
Published: (2024)
HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System
by: Zhang, Zhisheng, et al.
Published: (2024)
by: Zhang, Zhisheng, et al.
Published: (2024)
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
by: Kawamura, Masaya, et al.
Published: (2024)
by: Kawamura, Masaya, et al.
Published: (2024)
UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
by: Glazer, Neta, et al.
Published: (2025)
by: Glazer, Neta, et al.
Published: (2025)
Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition
by: Ravenscroft, William, et al.
Published: (2024)
by: Ravenscroft, William, et al.
Published: (2024)
Similar Items
-
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization
by: Lee, Sang-Hoon, et al.
Published: (2024) -
Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification
by: Liao, Yen-Lun, et al.
Published: (2022) -
Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator
by: Kaneko, Takuhiro, et al.
Published: (2024) -
Score-Based Training for Energy-Based TTS Models
by: Sun, Wanli, et al.
Published: (2025) -
RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer
by: Hong, Seongho, et al.
Published: (2025)