Saved in:
| Main Authors: | San, Nay, Paraskevopoulos, Georgios, Arora, Aryaman, He, Xiluo, Kaur, Prabhjot, Adams, Oliver, Jurafsky, Dan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.02302 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data
by: Paraskevopoulos, Georgios, et al.
Published: (2024)
by: Paraskevopoulos, Georgios, et al.
Published: (2024)
Direct Punjabi to English speech translation using discrete units
by: Kaur, Prabhjot, et al.
Published: (2024)
by: Kaur, Prabhjot, et al.
Published: (2024)
MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR
by: Damianos, Dimitrios, et al.
Published: (2025)
by: Damianos, Dimitrios, et al.
Published: (2025)
End-to-end transfer learning for speaker-independent cross-language and cross-corpus speech emotion recognition
by: Tang, Duowei, et al.
Published: (2023)
by: Tang, Duowei, et al.
Published: (2023)
Transcribe, Align and Segment: Creating speech datasets for low-resource languages
by: Sereda, Taras
Published: (2024)
by: Sereda, Taras
Published: (2024)
CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024)
by: Yao, Zengwei, et al.
Published: (2024)
Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition
by: An, Keyu, et al.
Published: (2024)
by: An, Keyu, et al.
Published: (2024)
FreeCodec: A disentangled neural speech codec with fewer tokens
by: Zheng, Youqiang, et al.
Published: (2024)
by: Zheng, Youqiang, et al.
Published: (2024)
Prominence-aware automatic speech recognition for conversational speech
by: Linke, Julian, et al.
Published: (2025)
by: Linke, Julian, et al.
Published: (2025)
Strategies for improving low resource speech to text translation relying on pre-trained ASR models
by: Kesiraju, Santosh, et al.
Published: (2023)
by: Kesiraju, Santosh, et al.
Published: (2023)
Meta-learning-based percussion transcription and $t\bar{a}la$ identification from low-resource audio
by: Kodag, Rahul Bapusaheb, et al.
Published: (2025)
by: Kodag, Rahul Bapusaheb, et al.
Published: (2025)
TokenSE: a Mamba-based discrete token speech enhancement framework for cochlear implants
by: Chiang, Hsin-Tien, et al.
Published: (2026)
by: Chiang, Hsin-Tien, et al.
Published: (2026)
Fusion approaches for emotion recognition from speech using acoustic and text-based features
by: Pepino, Leonardo, et al.
Published: (2024)
by: Pepino, Leonardo, et al.
Published: (2024)
VOX-KRIKRI: Unifying Speech and Language through Continuous Fusion
by: Damianos, Dimitrios, et al.
Published: (2025)
by: Damianos, Dimitrios, et al.
Published: (2025)
Scaling Multi-Talker ASR with Speaker-Agnostic Activity Streams
by: He, Xiluo, et al.
Published: (2025)
by: He, Xiluo, et al.
Published: (2025)
Training dynamic models using early exits for automatic speech recognition on resource-constrained devices
by: Wright, George August, et al.
Published: (2023)
by: Wright, George August, et al.
Published: (2023)
Robust fine-tuning of speech recognition models via model merging: application to disordered speech
by: Ducorroy, Alexandre, et al.
Published: (2025)
by: Ducorroy, Alexandre, et al.
Published: (2025)
Automated evaluation of children's speech fluency for low-resource languages
by: Zhang, Bowen, et al.
Published: (2025)
by: Zhang, Bowen, et al.
Published: (2025)
Improving child speech recognition with augmented child-like speech
by: Zhang, Yuanyuan, et al.
Published: (2024)
by: Zhang, Yuanyuan, et al.
Published: (2024)
IIITH-BUT system for IWSLT 2025 low-resource Bhojpuri to Hindi speech translation
by: Akkiraju, Bhavana, et al.
Published: (2025)
by: Akkiraju, Bhavana, et al.
Published: (2025)
Teaching the Teachers: Boosting unsupervised domain adaptation in speech recognition by ensemble update
by: Ahmad, Rehan, et al.
Published: (2026)
by: Ahmad, Rehan, et al.
Published: (2026)
Joint decoding method for controllable contextual speech recognition based on Speech LLM
by: Fang, Yangui, et al.
Published: (2025)
by: Fang, Yangui, et al.
Published: (2025)
Tone recognition in low-resource languages of North-East India: peeling the layers of SSL-based speech models
by: Gogoi, Parismita, et al.
Published: (2025)
by: Gogoi, Parismita, et al.
Published: (2025)
Introduction to speech recognition
by: Dauphin, Gabriel
Published: (2024)
by: Dauphin, Gabriel
Published: (2024)
BabAR: from phoneme recognition to developmental measures of young children's speech production
by: Lavechin, Marvin, et al.
Published: (2026)
by: Lavechin, Marvin, et al.
Published: (2026)
Graph-based multi-Feature fusion method for speech emotion recognition
by: Liu, Xueyu, et al.
Published: (2024)
by: Liu, Xueyu, et al.
Published: (2024)
Low-resource speech recognition and dialect identification of Irish in a multi-task framework
by: Lonergan, Liam, et al.
Published: (2024)
by: Lonergan, Liam, et al.
Published: (2024)
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
by: Vecino, Biel Tura, et al.
Published: (2025)
by: Vecino, Biel Tura, et al.
Published: (2025)
Accuracy enhancement method for speech emotion recognition from spectrogram using temporal frequency correlation and positional information learning through knowledge transfer
by: Kim, Jeong-Yoon, et al.
Published: (2024)
by: Kim, Jeong-Yoon, et al.
Published: (2024)
Language model integration based on memory control for sequence to sequence speech recognition
by: Cho, Jaejin, et al.
Published: (2018)
by: Cho, Jaejin, et al.
Published: (2018)
Phoneme-based speech recognition driven by large language models and sampling marginalization
by: Ma, Te, et al.
Published: (2025)
by: Ma, Te, et al.
Published: (2025)
An interpretable speech foundation model for depression detection by revealing prediction-relevant acoustic features from long speech
by: Deng, Qingkun, et al.
Published: (2024)
by: Deng, Qingkun, et al.
Published: (2024)
Heterogeneous bimodal attention fusion for speech emotion recognition
by: Luo, Jiachen, et al.
Published: (2025)
by: Luo, Jiachen, et al.
Published: (2025)
Guiding the underwater acoustic target recognition with interpretable contrastive learning
by: Xie, Yuan, et al.
Published: (2024)
by: Xie, Yuan, et al.
Published: (2024)
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
by: Gong, Rong, et al.
Published: (2024)
by: Gong, Rong, et al.
Published: (2024)
Throat and acoustic paired speech dataset for deep learning-based speech enhancement
by: Kim, Yunsik, et al.
Published: (2025)
by: Kim, Yunsik, et al.
Published: (2025)
Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition
by: Zhang, Yiru, et al.
Published: (2025)
by: Zhang, Yiru, et al.
Published: (2025)
Charting 15 years of progress in deep learning for speech emotion recognition: A replication study
by: Triantafyllopoulos, Andreas, et al.
Published: (2025)
by: Triantafyllopoulos, Andreas, et al.
Published: (2025)
PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers
by: Pandey, Rahul, et al.
Published: (2023)
by: Pandey, Rahul, et al.
Published: (2023)
Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition
by: Dong, Lukuang, et al.
Published: (2026)
by: Dong, Lukuang, et al.
Published: (2026)
Similar Items
-
The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data
by: Paraskevopoulos, Georgios, et al.
Published: (2024) -
Direct Punjabi to English speech translation using discrete units
by: Kaur, Prabhjot, et al.
Published: (2024) -
MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR
by: Damianos, Dimitrios, et al.
Published: (2025) -
End-to-end transfer learning for speaker-independent cross-language and cross-corpus speech emotion recognition
by: Tang, Duowei, et al.
Published: (2023) -
Transcribe, Align and Segment: Creating speech datasets for low-resource languages
by: Sereda, Taras
Published: (2024)