:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Abdullah, Abdulhady Abas, Badawi, Soran, Abdullah, Dana A., Hamad, Dana Rasul
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2505.04629
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)

Advanced Clustering Techniques for Speech Signal Enhancement: A Review and Metanalysis of Fuzzy C-Means, K-Means, and Kernel Fuzzy C-Means Methods
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)

End-to-End Transformer-based Automatic Speech Recognition for Northern Kurdish: A Pioneering Approach
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)

Speaker Diarization for Low-Resource Languages Through Wav2vec Fine-Tuning
by: Abdullah, Abdulhady Abas, et al.
Published: (2025)

Breaking Walls: Pioneering Automatic Speech Recognition for Central Kurdish: End-to-End Transformer Paradigm
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)

CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds
by: Budaghyan, David, et al.
Published: (2023)

FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
by: Liu, Yutong, et al.
Published: (2025)

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects
by: Yang, Sicheng, et al.
Published: (2026)

Dialectal Coverage And Generalization in Arabic Speech Recognition
by: Djanibekov, Amirbek, et al.
Published: (2024)

Improving Low-Resource Dialect Classification Using Retrieval-based Voice Conversion
by: Fischbach, Lea, et al.
Published: (2025)

Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing
by: Peng, An-Ci, et al.
Published: (2026)

Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
by: Abdullah, Badr M., et al.
Published: (2025)

VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka
by: Chen, Li-Wei, et al.
Published: (2024)

Closing the Gap Between Text and Speech Understanding in LLMs
by: Cuervo, Santiago, et al.
Published: (2025)

USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
by: Wang, Wenbin, et al.
Published: (2024)

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
by: Kang, Jiawen, et al.
Published: (2024)

ControlAudio: Tackling Text-Guided, Timing-Indicated and Intelligible Audio Generation via Progressive Diffusion Modeling
by: Jiang, Yuxuan, et al.
Published: (2025)

Automatic Pronunciation Error Detection and Correction of the Holy Quran's Learners Using Deep Learning
by: Abdelfattah, Abdullah, et al.
Published: (2025)

Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM
by: Thebaud, Thomas, et al.
Published: (2025)

ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis
by: Toyin, Hawau Olamide, et al.
Published: (2025)

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
by: Yang, Guanrou, et al.
Published: (2024)

GOAT-SLM: A Spoken Language Model with Paralinguistic and Speaker Characteristic Awareness
by: Chen, Hongjie, et al.
Published: (2025)

Iterative LLM-based improvement for French Clinical Interview Transcription and Speaker Diarization
by: Marie, Ambre, et al.
Published: (2026)

Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
by: Wang, Haoyu, et al.
Published: (2024)

Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
by: He, Xinlu, et al.
Published: (2025)

Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification
by: Zhang, Fengrun, et al.
Published: (2024)

Multi-class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum
by: Zhang, Yuanming, et al.
Published: (2024)

Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations
by: Feng, Bo-Han, et al.
Published: (2025)

Make It Hard to Hear, Easy to Learn: Long-Form Bengali ASR and Speaker Diarization via Extreme Augmentation and Perfect Alignment
by: Hasan, Sanjid, et al.
Published: (2026)

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
by: Wang, Peidong, et al.
Published: (2025)

Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning
by: Goel, Arnav, et al.
Published: (2024)

Interpolating Speaker Identities in Embedding Space for Data Expansion
by: Liu, Tianchi, et al.
Published: (2025)

Toward Responsible ASR for African American English Speakers: A Scoping Review of Bias and Equity in Speech Technology
by: Cunningham, Jay L., et al.
Published: (2025)

AG-LSEC: Audio Grounded Lexical Speaker Error Correction
by: Paturi, Rohit, et al.
Published: (2024)

Explainable Attribute-Based Speaker Verification
by: Wu, Xiaoliang, et al.
Published: (2024)

Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
by: Liu, Bei, et al.
Published: (2024)

Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization
by: Li, Xiang, et al.
Published: (2024)

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
by: Ghosh, Sreyan, et al.
Published: (2023)

DAME: Duration-Aware Matryoshka Embedding for Duration-Robust Speaker Verification
by: Jung, Youngmoon, et al.
Published: (2026)

A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation
by: Blaschke, Verena, et al.
Published: (2025)