:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huh, Mina, Ray, Ruchira, Karnei, Corey
Format:	Preprint
Published:	2023
Subjects:	Sound Artificial Intelligence Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2303.00510
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Overview of the Amphion Toolkit (v0.2)
by: Li, Jiaqi, et al.
Published: (2025)

ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis
by: Choi, Youngwon, et al.
Published: (2026)

Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis
by: Juvela, Lauri, et al.
Published: (2024)

Contrastive Augmentation: An Unsupervised Learning Approach for Keyword Spotting in Speech Technology
by: Dai, Weinan, et al.
Published: (2024)

Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology
by: Moell, Birger, et al.
Published: (2025)

Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset
by: Shah, Neil, et al.
Published: (2024)

Data-Efficient ASR Personalization for Non-Normative Speech Using an Uncertainty-Based Phoneme Difficulty Score for Guided Sampling
by: Pokel, Niclas, et al.
Published: (2025)

Unlocking Speech Instruction Data Potential with Query Rewriting
by: Hei, Yonghua, et al.
Published: (2025)

Unsupervised Speech Enhancement using Data-defined Priors
by: Klement, Dominik, et al.
Published: (2025)

Assessment of Personality Dimensions Across Situations Using Conversational Speech
by: Zhang, Alice, et al.
Published: (2025)

Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection
by: Zhang, Jinming, et al.
Published: (2025)

GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems
by: Robatian, Amin, et al.
Published: (2025)

A Tutorial on Clinical Speech AI Development: From Data Collection to Model Validation
by: Ng, Si-Ioi, et al.
Published: (2024)

Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation
by: Chang, Yi, et al.
Published: (2024)

ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
by: Wang, Xin, et al.
Published: (2024)

MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition
by: Pan, Yu, et al.
Published: (2023)

Speech-DRAME: A Framework for Human-Aligned Benchmarks in Speech Role-Play
by: Shi, Jiatong, et al.
Published: (2025)

EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations
by: Bian, Weizhen, et al.
Published: (2024)

Searching for Effective Preprocessing Method and CNN-based Architecture with Efficient Channel Attention on Speech Emotion Recognition
by: Kim, Byunggun, et al.
Published: (2024)

LLMs-Integrated Automatic Hate Speech Recognition Using Controllable Text Generation Models
by: Oshima, Ryutaro, et al.
Published: (2026)

Speech Emotion Recognition Using MFCC Features and LSTM-Based Deep Learning Model
by: Oluwademilade, Adelekun, et al.
Published: (2026)

Fine-Tuning Text-to-Speech Diffusion Models Using Reinforcement Learning with Human Feedback
by: Chen, Jingyi, et al.
Published: (2025)

The VoxCeleb Speaker Recognition Challenge: A Retrospective
by: Huh, Jaesung, et al.
Published: (2024)

MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt
by: Wu, Zhichao, et al.
Published: (2025)

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
by: Wang, Yongqi, et al.
Published: (2023)

Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition
by: Kim, Jaeyoung, et al.
Published: (2024)

Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
by: Lin, Zijian, et al.
Published: (2025)

EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
by: Ma, Ziyang, et al.
Published: (2024)

VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
by: Anastassiou, Philip, et al.
Published: (2024)

Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks
by: Lee, Seo-Hyun, et al.
Published: (2023)

CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech
by: Wang, Helin, et al.
Published: (2025)

Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis
by: Ji, Zhoulin, et al.
Published: (2024)

Interpreting Pretrained Speech Models for Automatic Speech Assessment of Voice Disorders
by: Lau, Hok-Shing, et al.
Published: (2024)

Advanced Clustering Techniques for Speech Signal Enhancement: A Review and Metanalysis of Fuzzy C-Means, K-Means, and Kernel Fuzzy C-Means Methods
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)

Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates
by: Xu, Haoning, et al.
Published: (2025)

One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
by: Li, Zhaoqing, et al.
Published: (2024)

Towards Improving NAM-to-Speech Synthesis Intelligibility using Self-Supervised Speech Models
by: Shah, Neil, et al.
Published: (2024)

Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
by: Choi, Yerin, et al.
Published: (2024)

MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection
by: Songyi, Li, et al.
Published: (2026)

TinyML for Speech Recognition
by: Barovic, Andrew, et al.
Published: (2025)