:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Povey, Anna, Povey, Katherine
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2410.00035
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation
by: Huang, Ruizhe, et al.
Published: (2024)

Spontaneous Informal Speech Dataset for Punctuation Restoration
by: Liu, Xing Yi, et al.
Published: (2024)

HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation
by: Hussein, Amir, et al.
Published: (2025)

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
by: Chen, Guoguo, et al.
Published: (2021)

TICL+: A Case Study On Speech In-Context Learning for Children's Speech Recognition
by: Zheng, Haolong, et al.
Published: (2025)

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models
by: Zhu, Han, et al.
Published: (2026)

RosettaSpeech: Zero-Shot Speech-to-Speech Translation without Parallel Speech
by: Zheng, Zhisheng, et al.
Published: (2025)

Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning
by: Nagpal, Chirag, et al.
Published: (2024)

ART: The Alternating Reading Task Corpus for Speech Entrainment and Imitation
by: Yuan, Zheng, et al.
Published: (2024)

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
by: Jin, Zengrui, et al.
Published: (2024)

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
by: Kawamura, Masaya, et al.
Published: (2024)

Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
by: Duret, Jarod, et al.
Published: (2024)

Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
by: Koluguri, Nithin Rao, et al.
Published: (2024)

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
by: Dhawan, Kunal, et al.
Published: (2024)

Speech Robust Bench: A Robustness Benchmark For Speech Recognition
by: Shah, Muhammad A., et al.
Published: (2024)

SimulTron: On-Device Simultaneous Speech to Speech Translation
by: Agranovich, Alex, et al.
Published: (2024)

Translatotron 3: Speech to Speech Translation with Monolingual Data
by: Nachmani, Eliya, et al.
Published: (2023)

Less Peaky and More Accurate CTC Forced Alignment by Label Priors
by: Huang, Ruizhe, et al.
Published: (2024)

Turbocharge Speech Understanding with Pilot Inference
by: Wang, Rongxiang, et al.
Published: (2023)

TaigiSpeech: A Low-Resource Real-World Speech Intent Dataset and Preliminary Results with Scalable Data Mining In-the-Wild
by: Chang, Kai-Wei, et al.
Published: (2026)

Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model
by: Fang, Hung-Chieh, et al.
Published: (2024)

Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026)

A Contrastive Learning Approach to Mitigate Bias in Speech Models
by: Koudounas, Alkis, et al.
Published: (2024)

SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
by: Chang, Kai-Wei, et al.
Published: (2024)

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
by: Wang, Xiaofei, et al.
Published: (2023)

Self-Supervised Speech Models Encode Phonetic Context via Position-dependent Orthogonal Subspaces
by: Choi, Kwanghee, et al.
Published: (2026)

Data-Centric Lessons To Improve Speech-Language Pretraining
by: Udandarao, Vishaal, et al.
Published: (2025)

Gender Bias in Instruction-Guided Speech Synthesis Models
by: Kuan, Chun-Yi, et al.
Published: (2025)

ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus
by: Ogunremi, Tolulope, et al.
Published: (2023)

CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
by: Shankar, Bhavani, et al.
Published: (2024)

On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
by: Rossenbach, Nick, et al.
Published: (2024)

The ParlaSpeech Collection of Automatically Generated Speech and Text Datasets from Parliamentary Proceedings
by: Ljubešić, Nikola, et al.
Published: (2024)

Reading Miscue Detection in Primary School through Automatic Speech Recognition
by: Gao, Lingyun, et al.
Published: (2024)

Africa-Centric Self-Supervised Pre-Training for Multilingual Speech Representation in a Sub-Saharan Context
by: Caubrière, Antoine, et al.
Published: (2024)

Modeling Overlapped Speech with Shuffles
by: Wiesner, Matthew, et al.
Published: (2026)

Self-Supervised Speech Representations are More Phonetic than Semantic
by: Choi, Kwanghee, et al.
Published: (2024)

Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment
by: Liu, Joseph, et al.
Published: (2024)

Scalable Frameworks for Real-World Audio-Visual Speech Recognition
by: Kim, Sungnyun
Published: (2025)

Continual Learning for Monolingual End-to-End Automatic Speech Recognition
by: Eeckt, Steven Vander, et al.
Published: (2021)

A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
by: Wagner, Dominik, et al.
Published: (2024)