Saved in:
| Main Authors: | Chen, Maximillian, Yu, Zhou |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2302.12921 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation
by: Lashkarashvili, Nineli, et al.
Published: (2024)
by: Lashkarashvili, Nineli, et al.
Published: (2024)
Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the Environment
by: Leem, Seong-Gyun, et al.
Published: (2024)
by: Leem, Seong-Gyun, et al.
Published: (2024)
Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026)
by: Wang, Chien-Chun, et al.
Published: (2026)
Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
by: Xu, Hainan, et al.
Published: (2024)
by: Xu, Hainan, et al.
Published: (2024)
Speech Robust Bench: A Robustness Benchmark For Speech Recognition
by: Shah, Muhammad A., et al.
Published: (2024)
by: Shah, Muhammad A., et al.
Published: (2024)
Generative Pre-training for Speech with Flow Matching
by: Liu, Alexander H., et al.
Published: (2023)
by: Liu, Alexander H., et al.
Published: (2023)
Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning
by: Nagpal, Chirag, et al.
Published: (2024)
by: Nagpal, Chirag, et al.
Published: (2024)
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
by: Yang, Chao-Han Huck, et al.
Published: (2024)
by: Yang, Chao-Han Huck, et al.
Published: (2024)
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
by: Rossenbach, Nick, et al.
Published: (2024)
by: Rossenbach, Nick, et al.
Published: (2024)
Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model
by: Lehečka, Jan, et al.
Published: (2024)
by: Lehečka, Jan, et al.
Published: (2024)
Moonshine: Speech Recognition for Live Transcription and Voice Commands
by: Jeffries, Nat, et al.
Published: (2024)
by: Jeffries, Nat, et al.
Published: (2024)
Regularizing Learnable Feature Extraction for Automatic Speech Recognition
by: Vieting, Peter, et al.
Published: (2025)
by: Vieting, Peter, et al.
Published: (2025)
Pre-Trained Foundation Model representations to uncover Breathing patterns in Speech
by: Mitra, Vikramjit, et al.
Published: (2024)
by: Mitra, Vikramjit, et al.
Published: (2024)
Examining Test-Time Adaptation for Personalized Child Speech Recognition
by: Shi, Zhonghao, et al.
Published: (2024)
by: Shi, Zhonghao, et al.
Published: (2024)
Less is More: Accurate Speech Recognition & Translation without Web-Scale Data
by: Puvvada, Krishna C., et al.
Published: (2024)
by: Puvvada, Krishna C., et al.
Published: (2024)
TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation
by: Yun, Taeyang, et al.
Published: (2024)
by: Yun, Taeyang, et al.
Published: (2024)
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
by: Kanda, Naoyuki, et al.
Published: (2024)
by: Kanda, Naoyuki, et al.
Published: (2024)
Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR
by: Bai, Junwen, et al.
Published: (2024)
by: Bai, Junwen, et al.
Published: (2024)
MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition
by: Sun, Haiyang, et al.
Published: (2023)
by: Sun, Haiyang, et al.
Published: (2023)
Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings
by: Shor, Joel, et al.
Published: (2023)
by: Shor, Joel, et al.
Published: (2023)
OLMoASR: Open Models and Data for Training Robust Speech Recognition Models
by: Ngo, Huong, et al.
Published: (2025)
by: Ngo, Huong, et al.
Published: (2025)
Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition
by: Chan, David M., et al.
Published: (2024)
by: Chan, David M., et al.
Published: (2024)
On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures
by: Hilmes, Benedikt, et al.
Published: (2024)
by: Hilmes, Benedikt, et al.
Published: (2024)
SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition
by: Wang, Pu, et al.
Published: (2026)
by: Wang, Pu, et al.
Published: (2026)
FlashSpeech: Efficient Zero-Shot Speech Synthesis
by: Ye, Zhen, et al.
Published: (2024)
by: Ye, Zhen, et al.
Published: (2024)
Africa-Centric Self-Supervised Pre-Training for Multilingual Speech Representation in a Sub-Saharan Context
by: Caubrière, Antoine, et al.
Published: (2024)
by: Caubrière, Antoine, et al.
Published: (2024)
On the Contribution of Lexical Features to Speech Emotion Recognition
by: Combei, David
Published: (2025)
by: Combei, David
Published: (2025)
Mai Ho'omāuna i ka 'Ai: Language Models Improve Automatic Speech Recognition in Hawaiian
by: Chaparala, Kaavya, et al.
Published: (2024)
by: Chaparala, Kaavya, et al.
Published: (2024)
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
by: Chen, Weidong, et al.
Published: (2023)
by: Chen, Weidong, et al.
Published: (2023)
CPT-Boosted Wav2vec2.0: Towards Noise Robust Speech Recognition for Classroom Environments
by: Attia, Ahmed Adel, et al.
Published: (2024)
by: Attia, Ahmed Adel, et al.
Published: (2024)
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
by: Wang, Xiaofei, et al.
Published: (2023)
by: Wang, Xiaofei, et al.
Published: (2023)
Test-Time Adaptation for Speech Emotion Recognition
by: Dong, Jiaheng, et al.
Published: (2026)
by: Dong, Jiaheng, et al.
Published: (2026)
Adapting WavLM for Speech Emotion Recognition
by: Diatlova, Daria, et al.
Published: (2024)
by: Diatlova, Daria, et al.
Published: (2024)
Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes
by: Glazer, Neta, et al.
Published: (2025)
by: Glazer, Neta, et al.
Published: (2025)
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
by: Duret, Jarod, et al.
Published: (2024)
by: Duret, Jarod, et al.
Published: (2024)
Are Paralinguistic Representations all that is needed for Speech Emotion Recognition?
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
by: Peng, Puyuan, et al.
Published: (2024)
by: Peng, Puyuan, et al.
Published: (2024)
TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
by: Chen, Chengxin, et al.
Published: (2024)
by: Chen, Chengxin, et al.
Published: (2024)
Translatotron 3: Speech to Speech Translation with Monolingual Data
by: Nachmani, Eliya, et al.
Published: (2023)
by: Nachmani, Eliya, et al.
Published: (2023)
Similar Items
-
Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation
by: Lashkarashvili, Nineli, et al.
Published: (2024) -
Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the Environment
by: Leem, Seong-Gyun, et al.
Published: (2024) -
Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026) -
Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
by: Xu, Hainan, et al.
Published: (2024) -
Speech Robust Bench: A Robustness Benchmark For Speech Recognition
by: Shah, Muhammad A., et al.
Published: (2024)