Saved in:
| Main Authors: | Povey, Anna, Povey, Katherine |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.00035 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation
by: Huang, Ruizhe, et al.
Published: (2024)
by: Huang, Ruizhe, et al.
Published: (2024)
Spontaneous Informal Speech Dataset for Punctuation Restoration
by: Liu, Xing Yi, et al.
Published: (2024)
by: Liu, Xing Yi, et al.
Published: (2024)
HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation
by: Hussein, Amir, et al.
Published: (2025)
by: Hussein, Amir, et al.
Published: (2025)
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
by: Chen, Guoguo, et al.
Published: (2021)
by: Chen, Guoguo, et al.
Published: (2021)
TICL+: A Case Study On Speech In-Context Learning for Children's Speech Recognition
by: Zheng, Haolong, et al.
Published: (2025)
by: Zheng, Haolong, et al.
Published: (2025)
OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models
by: Zhu, Han, et al.
Published: (2026)
by: Zhu, Han, et al.
Published: (2026)
RosettaSpeech: Zero-Shot Speech-to-Speech Translation without Parallel Speech
by: Zheng, Zhisheng, et al.
Published: (2025)
by: Zheng, Zhisheng, et al.
Published: (2025)
Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning
by: Nagpal, Chirag, et al.
Published: (2024)
by: Nagpal, Chirag, et al.
Published: (2024)
ART: The Alternating Reading Task Corpus for Speech Entrainment and Imitation
by: Yuan, Zheng, et al.
Published: (2024)
by: Yuan, Zheng, et al.
Published: (2024)
LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
by: Jin, Zengrui, et al.
Published: (2024)
by: Jin, Zengrui, et al.
Published: (2024)
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
by: Kawamura, Masaya, et al.
Published: (2024)
by: Kawamura, Masaya, et al.
Published: (2024)
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
by: Duret, Jarod, et al.
Published: (2024)
by: Duret, Jarod, et al.
Published: (2024)
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
by: Koluguri, Nithin Rao, et al.
Published: (2024)
by: Koluguri, Nithin Rao, et al.
Published: (2024)
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
by: Dhawan, Kunal, et al.
Published: (2024)
by: Dhawan, Kunal, et al.
Published: (2024)
Speech Robust Bench: A Robustness Benchmark For Speech Recognition
by: Shah, Muhammad A., et al.
Published: (2024)
by: Shah, Muhammad A., et al.
Published: (2024)
SimulTron: On-Device Simultaneous Speech to Speech Translation
by: Agranovich, Alex, et al.
Published: (2024)
by: Agranovich, Alex, et al.
Published: (2024)
Translatotron 3: Speech to Speech Translation with Monolingual Data
by: Nachmani, Eliya, et al.
Published: (2023)
by: Nachmani, Eliya, et al.
Published: (2023)
Less Peaky and More Accurate CTC Forced Alignment by Label Priors
by: Huang, Ruizhe, et al.
Published: (2024)
by: Huang, Ruizhe, et al.
Published: (2024)
Turbocharge Speech Understanding with Pilot Inference
by: Wang, Rongxiang, et al.
Published: (2023)
by: Wang, Rongxiang, et al.
Published: (2023)
TaigiSpeech: A Low-Resource Real-World Speech Intent Dataset and Preliminary Results with Scalable Data Mining In-the-Wild
by: Chang, Kai-Wei, et al.
Published: (2026)
by: Chang, Kai-Wei, et al.
Published: (2026)
Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model
by: Fang, Hung-Chieh, et al.
Published: (2024)
by: Fang, Hung-Chieh, et al.
Published: (2024)
Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026)
by: Wang, Chien-Chun, et al.
Published: (2026)
A Contrastive Learning Approach to Mitigate Bias in Speech Models
by: Koudounas, Alkis, et al.
Published: (2024)
by: Koudounas, Alkis, et al.
Published: (2024)
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
by: Chang, Kai-Wei, et al.
Published: (2024)
by: Chang, Kai-Wei, et al.
Published: (2024)
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
by: Wang, Xiaofei, et al.
Published: (2023)
by: Wang, Xiaofei, et al.
Published: (2023)
Self-Supervised Speech Models Encode Phonetic Context via Position-dependent Orthogonal Subspaces
by: Choi, Kwanghee, et al.
Published: (2026)
by: Choi, Kwanghee, et al.
Published: (2026)
Data-Centric Lessons To Improve Speech-Language Pretraining
by: Udandarao, Vishaal, et al.
Published: (2025)
by: Udandarao, Vishaal, et al.
Published: (2025)
Gender Bias in Instruction-Guided Speech Synthesis Models
by: Kuan, Chun-Yi, et al.
Published: (2025)
by: Kuan, Chun-Yi, et al.
Published: (2025)
ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus
by: Ogunremi, Tolulope, et al.
Published: (2023)
by: Ogunremi, Tolulope, et al.
Published: (2023)
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
by: Shankar, Bhavani, et al.
Published: (2024)
by: Shankar, Bhavani, et al.
Published: (2024)
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
by: Rossenbach, Nick, et al.
Published: (2024)
by: Rossenbach, Nick, et al.
Published: (2024)
The ParlaSpeech Collection of Automatically Generated Speech and Text Datasets from Parliamentary Proceedings
by: Ljubešić, Nikola, et al.
Published: (2024)
by: Ljubešić, Nikola, et al.
Published: (2024)
Reading Miscue Detection in Primary School through Automatic Speech Recognition
by: Gao, Lingyun, et al.
Published: (2024)
by: Gao, Lingyun, et al.
Published: (2024)
Africa-Centric Self-Supervised Pre-Training for Multilingual Speech Representation in a Sub-Saharan Context
by: Caubrière, Antoine, et al.
Published: (2024)
by: Caubrière, Antoine, et al.
Published: (2024)
Modeling Overlapped Speech with Shuffles
by: Wiesner, Matthew, et al.
Published: (2026)
by: Wiesner, Matthew, et al.
Published: (2026)
Self-Supervised Speech Representations are More Phonetic than Semantic
by: Choi, Kwanghee, et al.
Published: (2024)
by: Choi, Kwanghee, et al.
Published: (2024)
Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment
by: Liu, Joseph, et al.
Published: (2024)
by: Liu, Joseph, et al.
Published: (2024)
Scalable Frameworks for Real-World Audio-Visual Speech Recognition
by: Kim, Sungnyun
Published: (2025)
by: Kim, Sungnyun
Published: (2025)
Continual Learning for Monolingual End-to-End Automatic Speech Recognition
by: Eeckt, Steven Vander, et al.
Published: (2021)
by: Eeckt, Steven Vander, et al.
Published: (2021)
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
by: Wagner, Dominik, et al.
Published: (2024)
by: Wagner, Dominik, et al.
Published: (2024)
Similar Items
-
Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation
by: Huang, Ruizhe, et al.
Published: (2024) -
Spontaneous Informal Speech Dataset for Punctuation Restoration
by: Liu, Xing Yi, et al.
Published: (2024) -
HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation
by: Hussein, Amir, et al.
Published: (2025) -
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
by: Chen, Guoguo, et al.
Published: (2021) -
TICL+: A Case Study On Speech In-Context Learning for Children's Speech Recognition
by: Zheng, Haolong, et al.
Published: (2025)