:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xie, Jiamin, Hansen, John H. L.
Format:	Preprint
Published:	2023
Subjects:	Audio and Speech Processing Artificial Intelligence
Online Access:	https://arxiv.org/abs/2310.18450
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition
by: Xie, Jiamin, et al.
Published: (2022)

Speech Recognition on TV Series with Video-guided Post-ASR Correction
by: Yang, Haoyuan, et al.
Published: (2025)

Bridging the Modality Gap: Softly Discretizing Audio Representation for LLM-based Automatic Speech Recognition
by: Yang, Mu, et al.
Published: (2025)

Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention
by: Wang, Cong, et al.
Published: (2025)

Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
by: Xie, Jiamin, et al.
Published: (2025)

AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup
by: Carvalho, Carlos, et al.
Published: (2024)

Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition
by: Pokel, Niclas, et al.
Published: (2025)

RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification
by: Kim, June-Woo, et al.
Published: (2024)

Color-based Emotion Representation for Speech Emotion Recognition
by: Nagase, Ryotaro, et al.
Published: (2026)

Linear Time Complexity Conformers with SummaryMixing for Streaming Speech Recognition
by: Parcollet, Titouan, et al.
Published: (2024)

Joint Learning using Mixture-of-Expert-Based Representation for Speech Enhancement and Robust Emotion Recognition
by: Tzeng, Jing-Tong, et al.
Published: (2025)

Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation
by: Chang, Yi, et al.
Published: (2024)

TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs
by: Peng, Jing, et al.
Published: (2026)

Learning Physiology-Informed Vocal Spectrotemporal Representations for Speech Emotion Recognition
by: Zhang, Xu, et al.
Published: (2026)

Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments
by: Zhu, Pai, et al.
Published: (2024)

Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples
by: Wang, Zhenyu, et al.
Published: (2024)

EmoSphere-SER: Enhancing Speech Emotion Recognition Through Spherical Representation with Auxiliary Classification
by: Cho, Deok-Hyeon, et al.
Published: (2025)

TinyML for Speech Recognition
by: Barovic, Andrew, et al.
Published: (2025)

Open Source State-Of-the-Art Solution for Romanian Speech Recognition
by: Pirlogeanu, Gabriel, et al.
Published: (2025)

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition
by: Jin, Zengrui, et al.
Published: (2022)

Neural Blind Source Separation and Diarization for Distant Speech Recognition
by: Bando, Yoshiaki, et al.
Published: (2024)

Qieemo: Speech Is All You Need in the Emotion Recognition in Conversations
by: Chen, Jinming, et al.
Published: (2025)

Improving Code-Switching Speech Recognition with TTS Data Augmentation
by: Yeo, Yue Heng, et al.
Published: (2026)

Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization
by: Wu, Yihan, et al.
Published: (2024)

Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
by: Jiang, Xue, et al.
Published: (2025)

The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023
by: Wang, He, et al.
Published: (2024)

Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition
by: Kim, Jaeyoung, et al.
Published: (2024)

MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition
by: Wang, He, et al.
Published: (2024)

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
by: Guo, Yiwei, et al.
Published: (2024)

Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders
by: Kim, Seungbae, et al.
Published: (2025)

AS-ASR: A Lightweight Framework for Aphasia-Specific Automatic Speech Recognition
by: Bao, Chen, et al.
Published: (2025)

Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition
by: Hu, Shujie, et al.
Published: (2024)

TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition
by: Yang, Cheng-Yeh, et al.
Published: (2026)

Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
by: Jiang, Yicong, et al.
Published: (2024)

Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages
by: Fong, Seraphina, et al.
Published: (2025)

A Survey of Deep Learning for Complex Speech Spectrograms
by: Xie, Yuying, et al.
Published: (2025)

Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
by: Choi, Yerin, et al.
Published: (2024)

Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers
by: Cai, Runyuan, et al.
Published: (2026)

Persian Speech Emotion Recognition by Fine-Tuning Transformers
by: Shayaninasab, Minoo, et al.
Published: (2024)

RepCNN: Micro-sized, Mighty Models for Wakeword Detection
by: Kundu, Arnav, et al.
Published: (2024)