:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dewhurst, Maya, Collins, Jack, Lo, Justin J. H., Alderton, Roy, Kirkham, Sam
Format:	Preprint
Published:	2025
Subjects:	Sound Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2505.23339
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards a dynamical model of English vowels. Evidence from diphthongisation
by: Strycharczuk, Patrycja, et al.
Published: (2024)

Can large audio language models understand child stuttering speech? speech summarization, and source separation
by: Okocha, Chibuzor, et al.
Published: (2025)

The NTNU System at the S&I Challenge 2025 SLA Open Track
by: Lin, Hong-Yun, et al.
Published: (2025)

kNN For Whisper And Its Effect On Bias And Speaker Adaptation
by: Nachesa, Maya K., et al.
Published: (2024)

Multilingual acoustic word embeddings for zero-resource languages
by: Jacobs, Christiaan
Published: (2024)

Visual Cues Support Robust Turn-taking Prediction in Noise
by: Russell, Sam O'Connor, et al.
Published: (2025)

Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
by: Kang, Wonjune, et al.
Published: (2025)

Voice Conversion for Lombard Speaking Style with Implicit and Explicit Acoustic Feature Conditioning
by: Woszczyk, Dominika, et al.
Published: (2025)

Reverb: Open-Source ASR and Diarization from Rev
by: Bhandari, Nishchal, et al.
Published: (2024)

ESPnet-SpeechLM: An Open Speech Language Model Toolkit
by: Tian, Jinchuan, et al.
Published: (2025)

The THUEE System Description for the IARPA OpenASR21 Challenge
by: Zhao, Jing, et al.
Published: (2022)

Automated speech audiometry: Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?
by: Araiza-Illan, Gloria, et al.
Published: (2023)

OpusLM: A Family of Open Unified Speech Language Models
by: Tian, Jinchuan, et al.
Published: (2025)

Sagalee: an Open Source Automatic Speech Recognition Dataset for Oromo Language
by: Abu, Turi, et al.
Published: (2025)

OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia
by: Geng, Xuelong, et al.
Published: (2025)

Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
by: Geng, Xuelong, et al.
Published: (2024)

Chain-of-Thought Training for Open E2E Spoken Dialogue Systems
by: Arora, Siddhant, et al.
Published: (2025)

Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis
by: Chen, Yushen, et al.
Published: (2026)

Scaling Open Discrete Audio Foundation Models with Interleaved Semantic, Acoustic, and Text Tokens
by: Manakul, Potsawee, et al.
Published: (2026)

MultiPA: A Multi-task Speech Pronunciation Assessment Model for Open Response Scenarios
by: Chen, Yu-Wen, et al.
Published: (2023)

Spatial Audio Processing with Large Language Model on Wearable Devices
by: Mishra, Ayushi, et al.
Published: (2025)

OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
by: Sudo, Yui, et al.
Published: (2025)

OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning
by: Peng, Yifan, et al.
Published: (2025)

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
by: Peng, Yifan, et al.
Published: (2024)

Exploring Dynamic Parameters for Vietnamese Gender-Independent ASR
by: Leang, Sotheara, et al.
Published: (2025)

EmoTale: An Enacted Speech-emotion Dataset in Danish
by: Hjuler, Maja J., et al.
Published: (2025)

Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective
by: Liu, Alexander H., et al.
Published: (2024)

USAD: Universal Speech and Audio Representation via Distillation
by: Chang, Heng-Jui, et al.
Published: (2025)

On the Impact of Voice Anonymization on Speech Diagnostic Applications: a Case Study on COVID-19 Detection
by: Zhu, Yi, et al.
Published: (2023)

Idiosyncratic Versus Normative Modeling of Atypical Speech Recognition: Dysarthric Case Studies
by: Raja, Vishnu, et al.
Published: (2025)

You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties
by: Tuttösí, Paige, et al.
Published: (2025)

Disentangling Speaker Traits for Deepfake Source Verification via Chebyshev Polynomial and Riemannian Metric Learning
by: Xuan, Xi, et al.
Published: (2026)

Speaker Diarization for Low-Resource Languages Through Wav2vec Fine-Tuning
by: Abdullah, Abdulhady Abas, et al.
Published: (2025)

Mmm whatcha say? Uncovering distal and proximal context effects in first and second-language word perception using psychophysical reverse correlation
by: Tuttösí, Paige, et al.
Published: (2024)

Developing Enhanced Conversational Agents for Social Virtual Worlds
by: Griol, D., et al.
Published: (2025)

UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
by: Liu, Alexander H., et al.
Published: (2025)

Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning
by: Manco, Ilaria, et al.
Published: (2024)

Scaling and Prompting for Improved End-to-End Spoken Grammatical Error Correction
by: Qian, Mengjie, et al.
Published: (2025)

Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
by: Zhang, Wenyu, et al.
Published: (2025)

Assessment of L2 Oral Proficiency using Speech Large Language Models
by: Ma, Rao, et al.
Published: (2025)