:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sukhadia, Vrunda N., Chowdhury, Shammur Absar
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2604.14186
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Children's Speech Recognition through Discrete Token Enhancement
by: Sukhadia, Vrunda N., et al.
Published: (2024)

HARNESS: Lightweight Distilled Arabic Speech Foundation Models
by: sukhadia, Vrunda N., et al.
Published: (2025)

Multi-Task Instruction Tuning via Data Scheduling for Low-Resource Arabic AudioLLMs
by: Bhatti, Hunzalah Hassan, et al.
Published: (2026)

From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models
by: Ersoy, Asım, et al.
Published: (2025)

Beyond Orthography: Automatic Recovery of Short Vowels and Dialectal Sounds in Arabic
by: Kheir, Yassine El, et al.
Published: (2024)

Speech Representation Analysis based on Inter- and Intra-Model Similarities
by: Kheir, Yassine El, et al.
Published: (2024)

Unifying Model and Layer Fusion for Speech Foundation Models
by: Shih, Yi-Jen, et al.
Published: (2025)

MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond
by: Huzaifah, Muhammad, et al.
Published: (2024)

Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment
by: Azad, Asif, et al.
Published: (2026)

ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis
by: Toyin, Hawau Olamide, et al.
Published: (2025)

CAFE A Novel Code switching Dataset for Algerian Dialect French and English
by: Lachemat, Houssam Eddine-Othman, et al.
Published: (2024)

Lost in Phonation: Voice Quality Variation as an Evaluation Dimension for Speech Foundation Models
by: Lameris, Harm, et al.
Published: (2025)

MENASpeechBank: A Reference Voice Bank with Persona-Conditioned Multi-Turn Conversations for AudioLLMs
by: Ali, Zien Sheikh, et al.
Published: (2026)

X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs
by: Cao, Di, et al.
Published: (2026)

Adapting Foundation Speech Recognition Models to Impaired Speech: A Semantic Re-chaining Approach for Personalization of German Speech
by: Pokel, Niclas, et al.
Published: (2025)

DM-Codec: Distilling Multimodal Representations for Speech Tokenization
by: Ahasan, Md Mubtasim, et al.
Published: (2024)

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
by: Gaido, Marco, et al.
Published: (2024)

EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis
by: Zhou, Li, et al.
Published: (2026)

LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization
by: Jo, Daejin, et al.
Published: (2025)

Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks
by: Hsu, Ming-Hao, et al.
Published: (2023)

Streaming Speech-to-Text Translation with a SpeechLLM
by: Parcollet, Titouan, et al.
Published: (2026)

Phonology-Guided Speech-to-Speech Translation for African Languages
by: Ochieng, Peter, et al.
Published: (2024)

Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models
by: Wang, Qiongqiong, et al.
Published: (2025)

Speech Retrieval-Augmented Generation without Automatic Speech Recognition
by: Min, Do June, et al.
Published: (2024)

Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation
by: Zhang, Yuhao, et al.
Published: (2025)

Conversational Speech Reveals Structural Robustness Failures in SpeechLLM Backbones
by: Teleki, Maria, et al.
Published: (2025)

SyllableLM: Learning Coarse Semantic Units for Speech Language Models
by: Baade, Alan, et al.
Published: (2024)

Pretraining Large Brain Language Model for Active BCI: Silent Speech
by: Zhou, Jinzhao, et al.
Published: (2025)

SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought
by: Gong, Hongyu, et al.
Published: (2024)

An Empirical Analysis of Discrete Unit Representations in Speech Language Modeling Pre-training
by: Labrak, Yanis, et al.
Published: (2025)

SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models
by: Yao, Wenhan, et al.
Published: (2025)

SimulU: Training-free Policy for Long-form Simultaneous Speech-to-Speech Translation
by: Djanibekov, Amirbek, et al.
Published: (2026)

KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI
by: Kuroki, So, et al.
Published: (2025)

Self-supervised Speech Models for Word-Level Stuttered Speech Detection
by: Shih, Yi-Jen, et al.
Published: (2024)

When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition
by: Moure, Pehuén, et al.
Published: (2026)

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
by: Yang, Guanrou, et al.
Published: (2025)

FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
by: Lu, Yichen, et al.
Published: (2024)

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects
by: Yang, Sicheng, et al.
Published: (2026)

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study
by: Fan, Xiaoran, et al.
Published: (2025)

Pheme: Efficient and Conversational Speech Generation
by: Budzianowski, Paweł, et al.
Published: (2024)