:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	San, Nay, Paraskevopoulos, Georgios, Arora, Aryaman, He, Xiluo, Kaur, Prabhjot, Adams, Oliver, Jurafsky, Dan
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Computation and Language
Online Access:	https://arxiv.org/abs/2402.02302
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data
by: Paraskevopoulos, Georgios, et al.
Published: (2024)

Direct Punjabi to English speech translation using discrete units
by: Kaur, Prabhjot, et al.
Published: (2024)

MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR
by: Damianos, Dimitrios, et al.
Published: (2025)

End-to-end transfer learning for speaker-independent cross-language and cross-corpus speech emotion recognition
by: Tang, Duowei, et al.
Published: (2023)

Transcribe, Align and Segment: Creating speech datasets for low-resource languages
by: Sereda, Taras
Published: (2024)

CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024)

Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition
by: An, Keyu, et al.
Published: (2024)

FreeCodec: A disentangled neural speech codec with fewer tokens
by: Zheng, Youqiang, et al.
Published: (2024)

Prominence-aware automatic speech recognition for conversational speech
by: Linke, Julian, et al.
Published: (2025)

Strategies for improving low resource speech to text translation relying on pre-trained ASR models
by: Kesiraju, Santosh, et al.
Published: (2023)

Meta-learning-based percussion transcription and $t\bar{a}la$ identification from low-resource audio
by: Kodag, Rahul Bapusaheb, et al.
Published: (2025)

TokenSE: a Mamba-based discrete token speech enhancement framework for cochlear implants
by: Chiang, Hsin-Tien, et al.
Published: (2026)

Fusion approaches for emotion recognition from speech using acoustic and text-based features
by: Pepino, Leonardo, et al.
Published: (2024)

VOX-KRIKRI: Unifying Speech and Language through Continuous Fusion
by: Damianos, Dimitrios, et al.
Published: (2025)

Scaling Multi-Talker ASR with Speaker-Agnostic Activity Streams
by: He, Xiluo, et al.
Published: (2025)

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices
by: Wright, George August, et al.
Published: (2023)

Robust fine-tuning of speech recognition models via model merging: application to disordered speech
by: Ducorroy, Alexandre, et al.
Published: (2025)

Automated evaluation of children's speech fluency for low-resource languages
by: Zhang, Bowen, et al.
Published: (2025)

Improving child speech recognition with augmented child-like speech
by: Zhang, Yuanyuan, et al.
Published: (2024)

IIITH-BUT system for IWSLT 2025 low-resource Bhojpuri to Hindi speech translation
by: Akkiraju, Bhavana, et al.
Published: (2025)

Teaching the Teachers: Boosting unsupervised domain adaptation in speech recognition by ensemble update
by: Ahmad, Rehan, et al.
Published: (2026)

Joint decoding method for controllable contextual speech recognition based on Speech LLM
by: Fang, Yangui, et al.
Published: (2025)

Tone recognition in low-resource languages of North-East India: peeling the layers of SSL-based speech models
by: Gogoi, Parismita, et al.
Published: (2025)

Introduction to speech recognition
by: Dauphin, Gabriel
Published: (2024)

BabAR: from phoneme recognition to developmental measures of young children's speech production
by: Lavechin, Marvin, et al.
Published: (2026)

Graph-based multi-Feature fusion method for speech emotion recognition
by: Liu, Xueyu, et al.
Published: (2024)

Low-resource speech recognition and dialect identification of Irish in a multi-task framework
by: Lonergan, Liam, et al.
Published: (2024)

Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
by: Vecino, Biel Tura, et al.
Published: (2025)

Accuracy enhancement method for speech emotion recognition from spectrogram using temporal frequency correlation and positional information learning through knowledge transfer
by: Kim, Jeong-Yoon, et al.
Published: (2024)

Language model integration based on memory control for sequence to sequence speech recognition
by: Cho, Jaejin, et al.
Published: (2018)

Phoneme-based speech recognition driven by large language models and sampling marginalization
by: Ma, Te, et al.
Published: (2025)

An interpretable speech foundation model for depression detection by revealing prediction-relevant acoustic features from long speech
by: Deng, Qingkun, et al.
Published: (2024)

Heterogeneous bimodal attention fusion for speech emotion recognition
by: Luo, Jiachen, et al.
Published: (2025)

Guiding the underwater acoustic target recognition with interpretable contrastive learning
by: Xie, Yuan, et al.
Published: (2024)

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
by: Gong, Rong, et al.
Published: (2024)

Throat and acoustic paired speech dataset for deep learning-based speech enhancement
by: Kim, Yunsik, et al.
Published: (2025)

Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition
by: Zhang, Yiru, et al.
Published: (2025)

Charting 15 years of progress in deep learning for speech emotion recognition: A replication study
by: Triantafyllopoulos, Andreas, et al.
Published: (2025)

PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers
by: Pandey, Rahul, et al.
Published: (2023)

Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition
by: Dong, Lukuang, et al.
Published: (2026)