:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Maximillian, Yu, Zhou
Format:	Preprint
Published:	2023
Subjects:	Computation and Language Machine Learning Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2302.12921
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation
by: Lashkarashvili, Nineli, et al.
Published: (2024)

Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the Environment
by: Leem, Seong-Gyun, et al.
Published: (2024)

Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026)

Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
by: Xu, Hainan, et al.
Published: (2024)

Speech Robust Bench: A Robustness Benchmark For Speech Recognition
by: Shah, Muhammad A., et al.
Published: (2024)

Generative Pre-training for Speech with Flow Matching
by: Liu, Alexander H., et al.
Published: (2023)

Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning
by: Nagpal, Chirag, et al.
Published: (2024)

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
by: Yang, Chao-Han Huck, et al.
Published: (2024)

On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
by: Rossenbach, Nick, et al.
Published: (2024)

Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model
by: Lehečka, Jan, et al.
Published: (2024)

Moonshine: Speech Recognition for Live Transcription and Voice Commands
by: Jeffries, Nat, et al.
Published: (2024)

Regularizing Learnable Feature Extraction for Automatic Speech Recognition
by: Vieting, Peter, et al.
Published: (2025)

Pre-Trained Foundation Model representations to uncover Breathing patterns in Speech
by: Mitra, Vikramjit, et al.
Published: (2024)

Examining Test-Time Adaptation for Personalized Child Speech Recognition
by: Shi, Zhonghao, et al.
Published: (2024)

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data
by: Puvvada, Krishna C., et al.
Published: (2024)

TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation
by: Yun, Taeyang, et al.
Published: (2024)

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
by: Kanda, Naoyuki, et al.
Published: (2024)

Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR
by: Bai, Junwen, et al.
Published: (2024)

MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition
by: Sun, Haiyang, et al.
Published: (2023)

Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings
by: Shor, Joel, et al.
Published: (2023)

OLMoASR: Open Models and Data for Training Robust Speech Recognition Models
by: Ngo, Huong, et al.
Published: (2025)

Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition
by: Chan, David M., et al.
Published: (2024)

On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures
by: Hilmes, Benedikt, et al.
Published: (2024)

SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition
by: Wang, Pu, et al.
Published: (2026)

FlashSpeech: Efficient Zero-Shot Speech Synthesis
by: Ye, Zhen, et al.
Published: (2024)

Africa-Centric Self-Supervised Pre-Training for Multilingual Speech Representation in a Sub-Saharan Context
by: Caubrière, Antoine, et al.
Published: (2024)

On the Contribution of Lexical Features to Speech Emotion Recognition
by: Combei, David
Published: (2025)

Mai Ho'omāuna i ka 'Ai: Language Models Improve Automatic Speech Recognition in Hawaiian
by: Chaparala, Kaavya, et al.
Published: (2024)

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
by: Chen, Weidong, et al.
Published: (2023)

CPT-Boosted Wav2vec2.0: Towards Noise Robust Speech Recognition for Classroom Environments
by: Attia, Ahmed Adel, et al.
Published: (2024)

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
by: Hu, Yuchen, et al.
Published: (2024)

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
by: Wang, Xiaofei, et al.
Published: (2023)

Test-Time Adaptation for Speech Emotion Recognition
by: Dong, Jiaheng, et al.
Published: (2026)

Adapting WavLM for Speech Emotion Recognition
by: Diatlova, Daria, et al.
Published: (2024)

Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes
by: Glazer, Neta, et al.
Published: (2025)

Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
by: Duret, Jarod, et al.
Published: (2024)

Are Paralinguistic Representations all that is needed for Speech Emotion Recognition?
by: Phukan, Orchid Chetia, et al.
Published: (2024)

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
by: Peng, Puyuan, et al.
Published: (2024)

TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
by: Chen, Chengxin, et al.
Published: (2024)

Translatotron 3: Speech to Speech Translation with Monolingual Data
by: Nachmani, Eliya, et al.
Published: (2023)