:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Berriche, Lamia, Driss, Maha, Almuntashri, Areej Ahmed, Lghabi, Asma Mufreh, Almudhi, Heba Saleh, Almansour, Munerah Abdul-Aziz
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2411.13592
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards Zero-Shot Text-To-Speech for Arabic Dialects
by: Doan, Khai Duy, et al.
Published: (2024)

Arabic Little STT: Arabic Children Speech Recognition Dataset
by: Alkadri, Mouhand, et al.
Published: (2025)

Abjad-Kids: An Arabic Speech Classification Dataset for Primary Education
by: Snoubara, Abdul Aziz, et al.
Published: (2026)

Assessing the Impact of Speaker Identity in Speech Spoofing Detection
by: Dao, Anh-Tuan, et al.
Published: (2026)

Audio2Tool: Speak, Call, Act -- A Dataset for Benchmarking Speech Tool Use
by: Pahwa, Ramit, et al.
Published: (2026)

NADI 2025: The First Multidialectal Arabic Speech Processing Shared Task
by: Talafha, Bashar, et al.
Published: (2025)

Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding
by: Zhang, Xin, et al.
Published: (2025)

Speaking Without Sound: Multi-speaker Silent Speech Voicing with Facial Inputs Only
by: Lee, Jaejun, et al.
Published: (2026)

Dialectal Coverage And Generalization in Arabic Speech Recognition
by: Djanibekov, Amirbek, et al.
Published: (2024)

Hybrid CNN-Transformer Architecture for Arabic Speech Emotion Recognition
by: Gheffari, Youcef Soufiane, et al.
Published: (2026)

Building Tailored Speech Recognizers for Japanese Speaking Assessment
by: Kubo, Yotaro, et al.
Published: (2025)

JIS: A Speech Corpus of Japanese Idol Speakers with Various Speaking Styles
by: Kondo, Yuto, et al.
Published: (2025)

ARCADE: A City-Scale Corpus for Fine-Grained Arabic Dialect Tagging
by: Nacar, Omer, et al.
Published: (2026)

SpeakStream: Streaming Text-to-Speech with Interleaved Data
by: Bai, Richard He, et al.
Published: (2025)

Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
by: Kang, Wonjune, et al.
Published: (2025)

Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment
by: Azad, Asif, et al.
Published: (2026)

Speech Separation for Hearing-Impaired Children in the Classroom
by: Olalere, Feyisayo, et al.
Published: (2025)

Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis
by: Chen, Yushen, et al.
Published: (2026)

Voice Quality Dimensions as Interpretable Primitives for Speaking Style for Atypical Speech and Affect
by: Narain, Jaya, et al.
Published: (2025)

Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations
by: Li, Jialu, et al.
Published: (2024)

RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval
by: Sun, Haoqin, et al.
Published: (2025)

ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis
by: Toyin, Hawau Olamide, et al.
Published: (2025)

Code-Switching in End-to-End Automatic Speech Recognition: A Systematic Literature Review
by: Agro, Maha Tufail, et al.
Published: (2025)

ParaMETA: Towards Learning Disentangled Paralinguistic Speaking Styles Representations from Speech
by: Lou, Haowei, et al.
Published: (2026)

What Do Speech Foundation Models Not Learn About Speech?
by: Waheed, Abdul, et al.
Published: (2024)

LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect
by: Naouara, Hedi, et al.
Published: (2025)

Contextualized Token Discrimination for Speech Search Query Correction
by: Lu, Junyu, et al.
Published: (2025)

A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions
by: Wang, Chung-Chun, et al.
Published: (2025)

Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults
by: Attia, Ahmed Adel, et al.
Published: (2023)

Denoising GER: A Noise-Robust Generative Error Correction with LLM for Speech Recognition
by: Liu, Yanyan, et al.
Published: (2025)

Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation
by: Guo, Haohan, et al.
Published: (2024)

FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
by: Zhang, Tian-Hao, et al.
Published: (2025)

Speak Your Mind: The Speech Continuation Task as a Probe of Voice-Based Model Bias
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2025)

Efficient VoIP Communications through LLM-based Real-Time Speech Reconstruction and Call Prioritization for Emergency Services
by: Venkateshperumal, Danush, et al.
Published: (2024)

A multilingual training strategy for low resource Text to Speech
by: Amalas, Asma, et al.
Published: (2024)

IQRA 2026: Interspeech Challenge on Automatic Pronunciation Assessment for Modern Standard Arabic (MSA)
by: Kheir, Yassine El, et al.
Published: (2026)

YMIR: A new Benchmark Dataset and Model for Arabic Yemeni Music Genre Classification Using Convolutional Neural Networks
by: AL-Makhlafi, Moeen, et al.
Published: (2026)

KidSpeak: A General Multi-purpose LLM for Kids' Speech Recognition and Screening
by: Sharma, Rohan, et al.
Published: (2025)

Comparative Evaluation of Acoustic Feature Extraction Tools for Clinical Speech Analysis
by: Choi, Anna Seo Gyeong, et al.
Published: (2025)

When Vision Speaks for Sound
by: Wen, Xiaofei, et al.
Published: (2026)