Saved in:
| Main Authors: | Huh, Mina, Ray, Ruchira, Karnei, Corey |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2303.00510 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Overview of the Amphion Toolkit (v0.2)
by: Li, Jiaqi, et al.
Published: (2025)
by: Li, Jiaqi, et al.
Published: (2025)
ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis
by: Choi, Youngwon, et al.
Published: (2026)
by: Choi, Youngwon, et al.
Published: (2026)
Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis
by: Juvela, Lauri, et al.
Published: (2024)
by: Juvela, Lauri, et al.
Published: (2024)
Contrastive Augmentation: An Unsupervised Learning Approach for Keyword Spotting in Speech Technology
by: Dai, Weinan, et al.
Published: (2024)
by: Dai, Weinan, et al.
Published: (2024)
Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology
by: Moell, Birger, et al.
Published: (2025)
by: Moell, Birger, et al.
Published: (2025)
Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset
by: Shah, Neil, et al.
Published: (2024)
by: Shah, Neil, et al.
Published: (2024)
Data-Efficient ASR Personalization for Non-Normative Speech Using an Uncertainty-Based Phoneme Difficulty Score for Guided Sampling
by: Pokel, Niclas, et al.
Published: (2025)
by: Pokel, Niclas, et al.
Published: (2025)
Unlocking Speech Instruction Data Potential with Query Rewriting
by: Hei, Yonghua, et al.
Published: (2025)
by: Hei, Yonghua, et al.
Published: (2025)
Unsupervised Speech Enhancement using Data-defined Priors
by: Klement, Dominik, et al.
Published: (2025)
by: Klement, Dominik, et al.
Published: (2025)
Assessment of Personality Dimensions Across Situations Using Conversational Speech
by: Zhang, Alice, et al.
Published: (2025)
by: Zhang, Alice, et al.
Published: (2025)
Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection
by: Zhang, Jinming, et al.
Published: (2025)
by: Zhang, Jinming, et al.
Published: (2025)
GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems
by: Robatian, Amin, et al.
Published: (2025)
by: Robatian, Amin, et al.
Published: (2025)
A Tutorial on Clinical Speech AI Development: From Data Collection to Model Validation
by: Ng, Si-Ioi, et al.
Published: (2024)
by: Ng, Si-Ioi, et al.
Published: (2024)
Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation
by: Chang, Yi, et al.
Published: (2024)
by: Chang, Yi, et al.
Published: (2024)
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
by: Wang, Xin, et al.
Published: (2024)
by: Wang, Xin, et al.
Published: (2024)
MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition
by: Pan, Yu, et al.
Published: (2023)
by: Pan, Yu, et al.
Published: (2023)
Speech-DRAME: A Framework for Human-Aligned Benchmarks in Speech Role-Play
by: Shi, Jiatong, et al.
Published: (2025)
by: Shi, Jiatong, et al.
Published: (2025)
EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations
by: Bian, Weizhen, et al.
Published: (2024)
by: Bian, Weizhen, et al.
Published: (2024)
Searching for Effective Preprocessing Method and CNN-based Architecture with Efficient Channel Attention on Speech Emotion Recognition
by: Kim, Byunggun, et al.
Published: (2024)
by: Kim, Byunggun, et al.
Published: (2024)
LLMs-Integrated Automatic Hate Speech Recognition Using Controllable Text Generation Models
by: Oshima, Ryutaro, et al.
Published: (2026)
by: Oshima, Ryutaro, et al.
Published: (2026)
Speech Emotion Recognition Using MFCC Features and LSTM-Based Deep Learning Model
by: Oluwademilade, Adelekun, et al.
Published: (2026)
by: Oluwademilade, Adelekun, et al.
Published: (2026)
Fine-Tuning Text-to-Speech Diffusion Models Using Reinforcement Learning with Human Feedback
by: Chen, Jingyi, et al.
Published: (2025)
by: Chen, Jingyi, et al.
Published: (2025)
The VoxCeleb Speaker Recognition Challenge: A Retrospective
by: Huh, Jaesung, et al.
Published: (2024)
by: Huh, Jaesung, et al.
Published: (2024)
MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt
by: Wu, Zhichao, et al.
Published: (2025)
by: Wu, Zhichao, et al.
Published: (2025)
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
by: Wang, Yongqi, et al.
Published: (2023)
by: Wang, Yongqi, et al.
Published: (2023)
Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition
by: Kim, Jaeyoung, et al.
Published: (2024)
by: Kim, Jaeyoung, et al.
Published: (2024)
Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
by: Lin, Zijian, et al.
Published: (2025)
by: Lin, Zijian, et al.
Published: (2025)
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
by: Ma, Ziyang, et al.
Published: (2024)
by: Ma, Ziyang, et al.
Published: (2024)
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
by: Anastassiou, Philip, et al.
Published: (2024)
by: Anastassiou, Philip, et al.
Published: (2024)
Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks
by: Lee, Seo-Hyun, et al.
Published: (2023)
by: Lee, Seo-Hyun, et al.
Published: (2023)
CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech
by: Wang, Helin, et al.
Published: (2025)
by: Wang, Helin, et al.
Published: (2025)
Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis
by: Ji, Zhoulin, et al.
Published: (2024)
by: Ji, Zhoulin, et al.
Published: (2024)
Interpreting Pretrained Speech Models for Automatic Speech Assessment of Voice Disorders
by: Lau, Hok-Shing, et al.
Published: (2024)
by: Lau, Hok-Shing, et al.
Published: (2024)
Advanced Clustering Techniques for Speech Signal Enhancement: A Review and Metanalysis of Fuzzy C-Means, K-Means, and Kernel Fuzzy C-Means Methods
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates
by: Xu, Haoning, et al.
Published: (2025)
by: Xu, Haoning, et al.
Published: (2025)
One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
by: Li, Zhaoqing, et al.
Published: (2024)
by: Li, Zhaoqing, et al.
Published: (2024)
Towards Improving NAM-to-Speech Synthesis Intelligibility using Self-Supervised Speech Models
by: Shah, Neil, et al.
Published: (2024)
by: Shah, Neil, et al.
Published: (2024)
Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
by: Choi, Yerin, et al.
Published: (2024)
by: Choi, Yerin, et al.
Published: (2024)
MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection
by: Songyi, Li, et al.
Published: (2026)
by: Songyi, Li, et al.
Published: (2026)
TinyML for Speech Recognition
by: Barovic, Andrew, et al.
Published: (2025)
by: Barovic, Andrew, et al.
Published: (2025)
Similar Items
-
Overview of the Amphion Toolkit (v0.2)
by: Li, Jiaqi, et al.
Published: (2025) -
ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis
by: Choi, Youngwon, et al.
Published: (2026) -
Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis
by: Juvela, Lauri, et al.
Published: (2024) -
Contrastive Augmentation: An Unsupervised Learning Approach for Keyword Spotting in Speech Technology
by: Dai, Weinan, et al.
Published: (2024) -
Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology
by: Moell, Birger, et al.
Published: (2025)