Saved in:
| Main Authors: | Abdullah, Abdulhady Abas, Badawi, Soran, Abdullah, Dana A., Hamad, Dana Rasul |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.04629 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
Advanced Clustering Techniques for Speech Signal Enhancement: A Review and Metanalysis of Fuzzy C-Means, K-Means, and Kernel Fuzzy C-Means Methods
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
End-to-End Transformer-based Automatic Speech Recognition for Northern Kurdish: A Pioneering Approach
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
Speaker Diarization for Low-Resource Languages Through Wav2vec Fine-Tuning
by: Abdullah, Abdulhady Abas, et al.
Published: (2025)
by: Abdullah, Abdulhady Abas, et al.
Published: (2025)
Breaking Walls: Pioneering Automatic Speech Recognition for Central Kurdish: End-to-End Transformer Paradigm
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds
by: Budaghyan, David, et al.
Published: (2023)
by: Budaghyan, David, et al.
Published: (2023)
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
by: Liu, Yutong, et al.
Published: (2025)
by: Liu, Yutong, et al.
Published: (2025)
PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects
by: Yang, Sicheng, et al.
Published: (2026)
by: Yang, Sicheng, et al.
Published: (2026)
Dialectal Coverage And Generalization in Arabic Speech Recognition
by: Djanibekov, Amirbek, et al.
Published: (2024)
by: Djanibekov, Amirbek, et al.
Published: (2024)
Improving Low-Resource Dialect Classification Using Retrieval-based Voice Conversion
by: Fischbach, Lea, et al.
Published: (2025)
by: Fischbach, Lea, et al.
Published: (2025)
Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing
by: Peng, An-Ci, et al.
Published: (2026)
by: Peng, An-Ci, et al.
Published: (2026)
Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
by: Abdullah, Badr M., et al.
Published: (2025)
by: Abdullah, Badr M., et al.
Published: (2025)
VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka
by: Chen, Li-Wei, et al.
Published: (2024)
by: Chen, Li-Wei, et al.
Published: (2024)
Closing the Gap Between Text and Speech Understanding in LLMs
by: Cuervo, Santiago, et al.
Published: (2025)
by: Cuervo, Santiago, et al.
Published: (2025)
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
by: Wang, Wenbin, et al.
Published: (2024)
by: Wang, Wenbin, et al.
Published: (2024)
Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
by: Kang, Jiawen, et al.
Published: (2024)
by: Kang, Jiawen, et al.
Published: (2024)
ControlAudio: Tackling Text-Guided, Timing-Indicated and Intelligible Audio Generation via Progressive Diffusion Modeling
by: Jiang, Yuxuan, et al.
Published: (2025)
by: Jiang, Yuxuan, et al.
Published: (2025)
Automatic Pronunciation Error Detection and Correction of the Holy Quran's Learners Using Deep Learning
by: Abdelfattah, Abdullah, et al.
Published: (2025)
by: Abdelfattah, Abdullah, et al.
Published: (2025)
Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM
by: Thebaud, Thomas, et al.
Published: (2025)
by: Thebaud, Thomas, et al.
Published: (2025)
ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis
by: Toyin, Hawau Olamide, et al.
Published: (2025)
by: Toyin, Hawau Olamide, et al.
Published: (2025)
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
by: Yang, Guanrou, et al.
Published: (2024)
by: Yang, Guanrou, et al.
Published: (2024)
GOAT-SLM: A Spoken Language Model with Paralinguistic and Speaker Characteristic Awareness
by: Chen, Hongjie, et al.
Published: (2025)
by: Chen, Hongjie, et al.
Published: (2025)
Iterative LLM-based improvement for French Clinical Interview Transcription and Speaker Diarization
by: Marie, Ambre, et al.
Published: (2026)
by: Marie, Ambre, et al.
Published: (2026)
Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
by: Wang, Haoyu, et al.
Published: (2024)
by: Wang, Haoyu, et al.
Published: (2024)
Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
by: He, Xinlu, et al.
Published: (2025)
by: He, Xinlu, et al.
Published: (2025)
Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification
by: Zhang, Fengrun, et al.
Published: (2024)
by: Zhang, Fengrun, et al.
Published: (2024)
Multi-class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum
by: Zhang, Yuanming, et al.
Published: (2024)
by: Zhang, Yuanming, et al.
Published: (2024)
Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations
by: Feng, Bo-Han, et al.
Published: (2025)
by: Feng, Bo-Han, et al.
Published: (2025)
Make It Hard to Hear, Easy to Learn: Long-Form Bengali ASR and Speaker Diarization via Extreme Augmentation and Perfect Alignment
by: Hasan, Sanjid, et al.
Published: (2026)
by: Hasan, Sanjid, et al.
Published: (2026)
Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
by: Wang, Peidong, et al.
Published: (2025)
by: Wang, Peidong, et al.
Published: (2025)
Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning
by: Goel, Arnav, et al.
Published: (2024)
by: Goel, Arnav, et al.
Published: (2024)
Interpolating Speaker Identities in Embedding Space for Data Expansion
by: Liu, Tianchi, et al.
Published: (2025)
by: Liu, Tianchi, et al.
Published: (2025)
Toward Responsible ASR for African American English Speakers: A Scoping Review of Bias and Equity in Speech Technology
by: Cunningham, Jay L., et al.
Published: (2025)
by: Cunningham, Jay L., et al.
Published: (2025)
AG-LSEC: Audio Grounded Lexical Speaker Error Correction
by: Paturi, Rohit, et al.
Published: (2024)
by: Paturi, Rohit, et al.
Published: (2024)
Explainable Attribute-Based Speaker Verification
by: Wu, Xiaoliang, et al.
Published: (2024)
by: Wu, Xiaoliang, et al.
Published: (2024)
Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
by: Liu, Bei, et al.
Published: (2024)
by: Liu, Bei, et al.
Published: (2024)
Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
by: Ghosh, Sreyan, et al.
Published: (2023)
by: Ghosh, Sreyan, et al.
Published: (2023)
DAME: Duration-Aware Matryoshka Embedding for Duration-Robust Speaker Verification
by: Jung, Youngmoon, et al.
Published: (2026)
by: Jung, Youngmoon, et al.
Published: (2026)
A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation
by: Blaschke, Verena, et al.
Published: (2025)
by: Blaschke, Verena, et al.
Published: (2025)
Similar Items
-
Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach
by: Abdullah, Abdulhady Abas, et al.
Published: (2024) -
Advanced Clustering Techniques for Speech Signal Enhancement: A Review and Metanalysis of Fuzzy C-Means, K-Means, and Kernel Fuzzy C-Means Methods
by: Abdullah, Abdulhady Abas, et al.
Published: (2024) -
End-to-End Transformer-based Automatic Speech Recognition for Northern Kurdish: A Pioneering Approach
by: Abdullah, Abdulhady Abas, et al.
Published: (2024) -
Speaker Diarization for Low-Resource Languages Through Wav2vec Fine-Tuning
by: Abdullah, Abdulhady Abas, et al.
Published: (2025) -
Breaking Walls: Pioneering Automatic Speech Recognition for Central Kurdish: End-to-End Transformer Paradigm
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)