Saved in:
| Main Authors: | Sukhadia, Vrunda N., Chowdhury, Shammur Absar |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.14186 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Children's Speech Recognition through Discrete Token Enhancement
by: Sukhadia, Vrunda N., et al.
Published: (2024)
by: Sukhadia, Vrunda N., et al.
Published: (2024)
HARNESS: Lightweight Distilled Arabic Speech Foundation Models
by: sukhadia, Vrunda N., et al.
Published: (2025)
by: sukhadia, Vrunda N., et al.
Published: (2025)
Multi-Task Instruction Tuning via Data Scheduling for Low-Resource Arabic AudioLLMs
by: Bhatti, Hunzalah Hassan, et al.
Published: (2026)
by: Bhatti, Hunzalah Hassan, et al.
Published: (2026)
From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models
by: Ersoy, Asım, et al.
Published: (2025)
by: Ersoy, Asım, et al.
Published: (2025)
Beyond Orthography: Automatic Recovery of Short Vowels and Dialectal Sounds in Arabic
by: Kheir, Yassine El, et al.
Published: (2024)
by: Kheir, Yassine El, et al.
Published: (2024)
Speech Representation Analysis based on Inter- and Intra-Model Similarities
by: Kheir, Yassine El, et al.
Published: (2024)
by: Kheir, Yassine El, et al.
Published: (2024)
Unifying Model and Layer Fusion for Speech Foundation Models
by: Shih, Yi-Jen, et al.
Published: (2025)
by: Shih, Yi-Jen, et al.
Published: (2025)
MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond
by: Huzaifah, Muhammad, et al.
Published: (2024)
by: Huzaifah, Muhammad, et al.
Published: (2024)
Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment
by: Azad, Asif, et al.
Published: (2026)
by: Azad, Asif, et al.
Published: (2026)
ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis
by: Toyin, Hawau Olamide, et al.
Published: (2025)
by: Toyin, Hawau Olamide, et al.
Published: (2025)
CAFE A Novel Code switching Dataset for Algerian Dialect French and English
by: Lachemat, Houssam Eddine-Othman, et al.
Published: (2024)
by: Lachemat, Houssam Eddine-Othman, et al.
Published: (2024)
Lost in Phonation: Voice Quality Variation as an Evaluation Dimension for Speech Foundation Models
by: Lameris, Harm, et al.
Published: (2025)
by: Lameris, Harm, et al.
Published: (2025)
MENASpeechBank: A Reference Voice Bank with Persona-Conditioned Multi-Turn Conversations for AudioLLMs
by: Ali, Zien Sheikh, et al.
Published: (2026)
by: Ali, Zien Sheikh, et al.
Published: (2026)
X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs
by: Cao, Di, et al.
Published: (2026)
by: Cao, Di, et al.
Published: (2026)
Adapting Foundation Speech Recognition Models to Impaired Speech: A Semantic Re-chaining Approach for Personalization of German Speech
by: Pokel, Niclas, et al.
Published: (2025)
by: Pokel, Niclas, et al.
Published: (2025)
DM-Codec: Distilling Multimodal Representations for Speech Tokenization
by: Ahasan, Md Mubtasim, et al.
Published: (2024)
by: Ahasan, Md Mubtasim, et al.
Published: (2024)
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
by: Gaido, Marco, et al.
Published: (2024)
by: Gaido, Marco, et al.
Published: (2024)
EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis
by: Zhou, Li, et al.
Published: (2026)
by: Zhou, Li, et al.
Published: (2026)
LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization
by: Jo, Daejin, et al.
Published: (2025)
by: Jo, Daejin, et al.
Published: (2025)
Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks
by: Hsu, Ming-Hao, et al.
Published: (2023)
by: Hsu, Ming-Hao, et al.
Published: (2023)
Streaming Speech-to-Text Translation with a SpeechLLM
by: Parcollet, Titouan, et al.
Published: (2026)
by: Parcollet, Titouan, et al.
Published: (2026)
Phonology-Guided Speech-to-Speech Translation for African Languages
by: Ochieng, Peter, et al.
Published: (2024)
by: Ochieng, Peter, et al.
Published: (2024)
Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models
by: Wang, Qiongqiong, et al.
Published: (2025)
by: Wang, Qiongqiong, et al.
Published: (2025)
Speech Retrieval-Augmented Generation without Automatic Speech Recognition
by: Min, Do June, et al.
Published: (2024)
by: Min, Do June, et al.
Published: (2024)
Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation
by: Zhang, Yuhao, et al.
Published: (2025)
by: Zhang, Yuhao, et al.
Published: (2025)
Conversational Speech Reveals Structural Robustness Failures in SpeechLLM Backbones
by: Teleki, Maria, et al.
Published: (2025)
by: Teleki, Maria, et al.
Published: (2025)
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
by: Baade, Alan, et al.
Published: (2024)
by: Baade, Alan, et al.
Published: (2024)
Pretraining Large Brain Language Model for Active BCI: Silent Speech
by: Zhou, Jinzhao, et al.
Published: (2025)
by: Zhou, Jinzhao, et al.
Published: (2025)
SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought
by: Gong, Hongyu, et al.
Published: (2024)
by: Gong, Hongyu, et al.
Published: (2024)
An Empirical Analysis of Discrete Unit Representations in Speech Language Modeling Pre-training
by: Labrak, Yanis, et al.
Published: (2025)
by: Labrak, Yanis, et al.
Published: (2025)
SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models
by: Yao, Wenhan, et al.
Published: (2025)
by: Yao, Wenhan, et al.
Published: (2025)
SimulU: Training-free Policy for Long-form Simultaneous Speech-to-Speech Translation
by: Djanibekov, Amirbek, et al.
Published: (2026)
by: Djanibekov, Amirbek, et al.
Published: (2026)
KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI
by: Kuroki, So, et al.
Published: (2025)
by: Kuroki, So, et al.
Published: (2025)
Self-supervised Speech Models for Word-Level Stuttered Speech Detection
by: Shih, Yi-Jen, et al.
Published: (2024)
by: Shih, Yi-Jen, et al.
Published: (2024)
When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition
by: Moure, Pehuén, et al.
Published: (2026)
by: Moure, Pehuén, et al.
Published: (2026)
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
by: Yang, Guanrou, et al.
Published: (2025)
by: Yang, Guanrou, et al.
Published: (2025)
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
by: Lu, Yichen, et al.
Published: (2024)
by: Lu, Yichen, et al.
Published: (2024)
PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects
by: Yang, Sicheng, et al.
Published: (2026)
by: Yang, Sicheng, et al.
Published: (2026)
What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study
by: Fan, Xiaoran, et al.
Published: (2025)
by: Fan, Xiaoran, et al.
Published: (2025)
Pheme: Efficient and Conversational Speech Generation
by: Budzianowski, Paweł, et al.
Published: (2024)
by: Budzianowski, Paweł, et al.
Published: (2024)
Similar Items
-
Children's Speech Recognition through Discrete Token Enhancement
by: Sukhadia, Vrunda N., et al.
Published: (2024) -
HARNESS: Lightweight Distilled Arabic Speech Foundation Models
by: sukhadia, Vrunda N., et al.
Published: (2025) -
Multi-Task Instruction Tuning via Data Scheduling for Low-Resource Arabic AudioLLMs
by: Bhatti, Hunzalah Hassan, et al.
Published: (2026) -
From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models
by: Ersoy, Asım, et al.
Published: (2025) -
Beyond Orthography: Automatic Recovery of Short Vowels and Dialectal Sounds in Arabic
by: Kheir, Yassine El, et al.
Published: (2024)