Saved in:
| Main Authors: | Xie, Jiamin, Hansen, John H. L. |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2310.18450 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition
by: Xie, Jiamin, et al.
Published: (2022)
by: Xie, Jiamin, et al.
Published: (2022)
Speech Recognition on TV Series with Video-guided Post-ASR Correction
by: Yang, Haoyuan, et al.
Published: (2025)
by: Yang, Haoyuan, et al.
Published: (2025)
Bridging the Modality Gap: Softly Discretizing Audio Representation for LLM-based Automatic Speech Recognition
by: Yang, Mu, et al.
Published: (2025)
by: Yang, Mu, et al.
Published: (2025)
Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention
by: Wang, Cong, et al.
Published: (2025)
by: Wang, Cong, et al.
Published: (2025)
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
by: Xie, Jiamin, et al.
Published: (2025)
by: Xie, Jiamin, et al.
Published: (2025)
AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup
by: Carvalho, Carlos, et al.
Published: (2024)
by: Carvalho, Carlos, et al.
Published: (2024)
Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition
by: Pokel, Niclas, et al.
Published: (2025)
by: Pokel, Niclas, et al.
Published: (2025)
RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification
by: Kim, June-Woo, et al.
Published: (2024)
by: Kim, June-Woo, et al.
Published: (2024)
Color-based Emotion Representation for Speech Emotion Recognition
by: Nagase, Ryotaro, et al.
Published: (2026)
by: Nagase, Ryotaro, et al.
Published: (2026)
Linear Time Complexity Conformers with SummaryMixing for Streaming Speech Recognition
by: Parcollet, Titouan, et al.
Published: (2024)
by: Parcollet, Titouan, et al.
Published: (2024)
Joint Learning using Mixture-of-Expert-Based Representation for Speech Enhancement and Robust Emotion Recognition
by: Tzeng, Jing-Tong, et al.
Published: (2025)
by: Tzeng, Jing-Tong, et al.
Published: (2025)
Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation
by: Chang, Yi, et al.
Published: (2024)
by: Chang, Yi, et al.
Published: (2024)
TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs
by: Peng, Jing, et al.
Published: (2026)
by: Peng, Jing, et al.
Published: (2026)
Learning Physiology-Informed Vocal Spectrotemporal Representations for Speech Emotion Recognition
by: Zhang, Xu, et al.
Published: (2026)
by: Zhang, Xu, et al.
Published: (2026)
Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments
by: Zhu, Pai, et al.
Published: (2024)
by: Zhu, Pai, et al.
Published: (2024)
Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples
by: Wang, Zhenyu, et al.
Published: (2024)
by: Wang, Zhenyu, et al.
Published: (2024)
EmoSphere-SER: Enhancing Speech Emotion Recognition Through Spherical Representation with Auxiliary Classification
by: Cho, Deok-Hyeon, et al.
Published: (2025)
by: Cho, Deok-Hyeon, et al.
Published: (2025)
TinyML for Speech Recognition
by: Barovic, Andrew, et al.
Published: (2025)
by: Barovic, Andrew, et al.
Published: (2025)
Open Source State-Of-the-Art Solution for Romanian Speech Recognition
by: Pirlogeanu, Gabriel, et al.
Published: (2025)
by: Pirlogeanu, Gabriel, et al.
Published: (2025)
Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition
by: Jin, Zengrui, et al.
Published: (2022)
by: Jin, Zengrui, et al.
Published: (2022)
Neural Blind Source Separation and Diarization for Distant Speech Recognition
by: Bando, Yoshiaki, et al.
Published: (2024)
by: Bando, Yoshiaki, et al.
Published: (2024)
Qieemo: Speech Is All You Need in the Emotion Recognition in Conversations
by: Chen, Jinming, et al.
Published: (2025)
by: Chen, Jinming, et al.
Published: (2025)
Improving Code-Switching Speech Recognition with TTS Data Augmentation
by: Yeo, Yue Heng, et al.
Published: (2026)
by: Yeo, Yue Heng, et al.
Published: (2026)
Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization
by: Wu, Yihan, et al.
Published: (2024)
by: Wu, Yihan, et al.
Published: (2024)
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
by: Jiang, Xue, et al.
Published: (2025)
by: Jiang, Xue, et al.
Published: (2025)
The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023
by: Wang, He, et al.
Published: (2024)
by: Wang, He, et al.
Published: (2024)
Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition
by: Kim, Jaeyoung, et al.
Published: (2024)
by: Kim, Jaeyoung, et al.
Published: (2024)
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition
by: Wang, He, et al.
Published: (2024)
by: Wang, He, et al.
Published: (2024)
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
by: Guo, Yiwei, et al.
Published: (2024)
by: Guo, Yiwei, et al.
Published: (2024)
Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders
by: Kim, Seungbae, et al.
Published: (2025)
by: Kim, Seungbae, et al.
Published: (2025)
AS-ASR: A Lightweight Framework for Aphasia-Specific Automatic Speech Recognition
by: Bao, Chen, et al.
Published: (2025)
by: Bao, Chen, et al.
Published: (2025)
Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition
by: Hu, Shujie, et al.
Published: (2024)
by: Hu, Shujie, et al.
Published: (2024)
TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition
by: Yang, Cheng-Yeh, et al.
Published: (2026)
by: Yang, Cheng-Yeh, et al.
Published: (2026)
Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
by: Jiang, Yicong, et al.
Published: (2024)
by: Jiang, Yicong, et al.
Published: (2024)
Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages
by: Fong, Seraphina, et al.
Published: (2025)
by: Fong, Seraphina, et al.
Published: (2025)
A Survey of Deep Learning for Complex Speech Spectrograms
by: Xie, Yuying, et al.
Published: (2025)
by: Xie, Yuying, et al.
Published: (2025)
Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
by: Choi, Yerin, et al.
Published: (2024)
by: Choi, Yerin, et al.
Published: (2024)
Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers
by: Cai, Runyuan, et al.
Published: (2026)
by: Cai, Runyuan, et al.
Published: (2026)
Persian Speech Emotion Recognition by Fine-Tuning Transformers
by: Shayaninasab, Minoo, et al.
Published: (2024)
by: Shayaninasab, Minoo, et al.
Published: (2024)
RepCNN: Micro-sized, Mighty Models for Wakeword Detection
by: Kundu, Arnav, et al.
Published: (2024)
by: Kundu, Arnav, et al.
Published: (2024)
Similar Items
-
DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition
by: Xie, Jiamin, et al.
Published: (2022) -
Speech Recognition on TV Series with Video-guided Post-ASR Correction
by: Yang, Haoyuan, et al.
Published: (2025) -
Bridging the Modality Gap: Softly Discretizing Audio Representation for LLM-based Automatic Speech Recognition
by: Yang, Mu, et al.
Published: (2025) -
Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention
by: Wang, Cong, et al.
Published: (2025) -
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
by: Xie, Jiamin, et al.
Published: (2025)