Saved in:
| Main Authors: | Dewhurst, Maya, Collins, Jack, Lo, Justin J. H., Alderton, Roy, Kirkham, Sam |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.23339 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards a dynamical model of English vowels. Evidence from diphthongisation
by: Strycharczuk, Patrycja, et al.
Published: (2024)
by: Strycharczuk, Patrycja, et al.
Published: (2024)
Can large audio language models understand child stuttering speech? speech summarization, and source separation
by: Okocha, Chibuzor, et al.
Published: (2025)
by: Okocha, Chibuzor, et al.
Published: (2025)
The NTNU System at the S&I Challenge 2025 SLA Open Track
by: Lin, Hong-Yun, et al.
Published: (2025)
by: Lin, Hong-Yun, et al.
Published: (2025)
kNN For Whisper And Its Effect On Bias And Speaker Adaptation
by: Nachesa, Maya K., et al.
Published: (2024)
by: Nachesa, Maya K., et al.
Published: (2024)
Multilingual acoustic word embeddings for zero-resource languages
by: Jacobs, Christiaan
Published: (2024)
by: Jacobs, Christiaan
Published: (2024)
Visual Cues Support Robust Turn-taking Prediction in Noise
by: Russell, Sam O'Connor, et al.
Published: (2025)
by: Russell, Sam O'Connor, et al.
Published: (2025)
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
by: Kang, Wonjune, et al.
Published: (2025)
by: Kang, Wonjune, et al.
Published: (2025)
Voice Conversion for Lombard Speaking Style with Implicit and Explicit Acoustic Feature Conditioning
by: Woszczyk, Dominika, et al.
Published: (2025)
by: Woszczyk, Dominika, et al.
Published: (2025)
Reverb: Open-Source ASR and Diarization from Rev
by: Bhandari, Nishchal, et al.
Published: (2024)
by: Bhandari, Nishchal, et al.
Published: (2024)
ESPnet-SpeechLM: An Open Speech Language Model Toolkit
by: Tian, Jinchuan, et al.
Published: (2025)
by: Tian, Jinchuan, et al.
Published: (2025)
The THUEE System Description for the IARPA OpenASR21 Challenge
by: Zhao, Jing, et al.
Published: (2022)
by: Zhao, Jing, et al.
Published: (2022)
Automated speech audiometry: Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?
by: Araiza-Illan, Gloria, et al.
Published: (2023)
by: Araiza-Illan, Gloria, et al.
Published: (2023)
OpusLM: A Family of Open Unified Speech Language Models
by: Tian, Jinchuan, et al.
Published: (2025)
by: Tian, Jinchuan, et al.
Published: (2025)
Sagalee: an Open Source Automatic Speech Recognition Dataset for Oromo Language
by: Abu, Turi, et al.
Published: (2025)
by: Abu, Turi, et al.
Published: (2025)
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia
by: Geng, Xuelong, et al.
Published: (2025)
by: Geng, Xuelong, et al.
Published: (2025)
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
by: Geng, Xuelong, et al.
Published: (2024)
by: Geng, Xuelong, et al.
Published: (2024)
Chain-of-Thought Training for Open E2E Spoken Dialogue Systems
by: Arora, Siddhant, et al.
Published: (2025)
by: Arora, Siddhant, et al.
Published: (2025)
Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis
by: Chen, Yushen, et al.
Published: (2026)
by: Chen, Yushen, et al.
Published: (2026)
Scaling Open Discrete Audio Foundation Models with Interleaved Semantic, Acoustic, and Text Tokens
by: Manakul, Potsawee, et al.
Published: (2026)
by: Manakul, Potsawee, et al.
Published: (2026)
MultiPA: A Multi-task Speech Pronunciation Assessment Model for Open Response Scenarios
by: Chen, Yu-Wen, et al.
Published: (2023)
by: Chen, Yu-Wen, et al.
Published: (2023)
Spatial Audio Processing with Large Language Model on Wearable Devices
by: Mishra, Ayushi, et al.
Published: (2025)
by: Mishra, Ayushi, et al.
Published: (2025)
OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
by: Sudo, Yui, et al.
Published: (2025)
by: Sudo, Yui, et al.
Published: (2025)
OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning
by: Peng, Yifan, et al.
Published: (2025)
by: Peng, Yifan, et al.
Published: (2025)
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
by: Peng, Yifan, et al.
Published: (2024)
by: Peng, Yifan, et al.
Published: (2024)
Exploring Dynamic Parameters for Vietnamese Gender-Independent ASR
by: Leang, Sotheara, et al.
Published: (2025)
by: Leang, Sotheara, et al.
Published: (2025)
EmoTale: An Enacted Speech-emotion Dataset in Danish
by: Hjuler, Maja J., et al.
Published: (2025)
by: Hjuler, Maja J., et al.
Published: (2025)
Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective
by: Liu, Alexander H., et al.
Published: (2024)
by: Liu, Alexander H., et al.
Published: (2024)
USAD: Universal Speech and Audio Representation via Distillation
by: Chang, Heng-Jui, et al.
Published: (2025)
by: Chang, Heng-Jui, et al.
Published: (2025)
On the Impact of Voice Anonymization on Speech Diagnostic Applications: a Case Study on COVID-19 Detection
by: Zhu, Yi, et al.
Published: (2023)
by: Zhu, Yi, et al.
Published: (2023)
Idiosyncratic Versus Normative Modeling of Atypical Speech Recognition: Dysarthric Case Studies
by: Raja, Vishnu, et al.
Published: (2025)
by: Raja, Vishnu, et al.
Published: (2025)
You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties
by: Tuttösí, Paige, et al.
Published: (2025)
by: Tuttösí, Paige, et al.
Published: (2025)
Disentangling Speaker Traits for Deepfake Source Verification via Chebyshev Polynomial and Riemannian Metric Learning
by: Xuan, Xi, et al.
Published: (2026)
by: Xuan, Xi, et al.
Published: (2026)
Speaker Diarization for Low-Resource Languages Through Wav2vec Fine-Tuning
by: Abdullah, Abdulhady Abas, et al.
Published: (2025)
by: Abdullah, Abdulhady Abas, et al.
Published: (2025)
Mmm whatcha say? Uncovering distal and proximal context effects in first and second-language word perception using psychophysical reverse correlation
by: Tuttösí, Paige, et al.
Published: (2024)
by: Tuttösí, Paige, et al.
Published: (2024)
Developing Enhanced Conversational Agents for Social Virtual Worlds
by: Griol, D., et al.
Published: (2025)
by: Griol, D., et al.
Published: (2025)
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
by: Liu, Alexander H., et al.
Published: (2025)
by: Liu, Alexander H., et al.
Published: (2025)
Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning
by: Manco, Ilaria, et al.
Published: (2024)
by: Manco, Ilaria, et al.
Published: (2024)
Scaling and Prompting for Improved End-to-End Spoken Grammatical Error Correction
by: Qian, Mengjie, et al.
Published: (2025)
by: Qian, Mengjie, et al.
Published: (2025)
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
by: Zhang, Wenyu, et al.
Published: (2025)
by: Zhang, Wenyu, et al.
Published: (2025)
Assessment of L2 Oral Proficiency using Speech Large Language Models
by: Ma, Rao, et al.
Published: (2025)
by: Ma, Rao, et al.
Published: (2025)
Similar Items
-
Towards a dynamical model of English vowels. Evidence from diphthongisation
by: Strycharczuk, Patrycja, et al.
Published: (2024) -
Can large audio language models understand child stuttering speech? speech summarization, and source separation
by: Okocha, Chibuzor, et al.
Published: (2025) -
The NTNU System at the S&I Challenge 2025 SLA Open Track
by: Lin, Hong-Yun, et al.
Published: (2025) -
kNN For Whisper And Its Effect On Bias And Speaker Adaptation
by: Nachesa, Maya K., et al.
Published: (2024) -
Multilingual acoustic word embeddings for zero-resource languages
by: Jacobs, Christiaan
Published: (2024)