:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Youjun, Li, Guinan, Geng, Mengzhe, Xie, Xurong, Hu, Shujie, Wang, Huimeng, Xu, Haoning, Deng, Chengxi, Deng, Jiajun, Li, Zhaoqing, Cui, Mingyu, Liu, Xunying
Format:	Preprint
Published:	2026
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2602.18802
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition
by: Chen, Youjun, et al.
Published: (2025)

Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition
by: Wang, Huimeng, et al.
Published: (2025)

On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition
by: HU, Shujie, et al.
Published: (2025)

Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates
by: Xu, Haoning, et al.
Published: (2025)

Effective and Efficient Mixed Precision Quantization of Speech Foundation Models
by: Xu, Haoning, et al.
Published: (2025)

Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition
by: Hu, Shujie, et al.
Published: (2024)

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
by: Li, Guinan, et al.
Published: (2024)

Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation
by: Wang, Huimeng, et al.
Published: (2024)

MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition
by: Deng, Chengxi, et al.
Published: (2025)

Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition
by: Hu, Shujie, et al.
Published: (2024)

Regularized Federated Learning for Privacy-Preserving Dysarthric and Elderly Speech Recognition
by: Zhong, Tao, et al.
Published: (2025)

One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
by: Li, Zhaoqing, et al.
Published: (2024)

Exploring Cross-Utterance Speech Contexts for Conformer-Transducer Speech Recognition Systems
by: Cui, Mingyu, et al.
Published: (2025)

Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask
by: Wang, Tianzi, et al.
Published: (2024)

Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation
by: Geng, Mengzhe, et al.
Published: (2024)

Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models
by: Li, Zhaoqing, et al.
Published: (2025)

Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision
by: Li, Zhaoqing, et al.
Published: (2025)

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
by: Cui, Mingyu, et al.
Published: (2024)

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition
by: Jin, Zengrui, et al.
Published: (2022)

Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition
by: Xie, Xurong, et al.
Published: (2022)

Cocktail-Party Audio-Visual Speech Recognition
by: Nguyen, Thai-Binh, et al.
Published: (2025)

Investigation of Deep Neural Network Acoustic Modelling Approaches for Low Resource Accented Mandarin Speech Recognition
by: Xie, Xurong, et al.
Published: (2022)

Towards Effective and Efficient Non-autoregressive decoders for Conformer and LLM-based ASR using Block-based Attention Mask
by: Wang, Tianzi, et al.
Published: (2025)

SemaVoice: Semantic-Aware Continuous Autoregressive Speech Synthesis
by: Wang, Huimeng, et al.
Published: (2026)

EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning
by: Wang, Dingdong, et al.
Published: (2026)

UNISON: A Unified Sound Generation and Editing Framework via Deep LLM Fusion
by: Li, Zhaoqing, et al.
Published: (2026)

Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
by: Rahimi, Akam, et al.
Published: (2025)

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
by: Kang, Jiawen, et al.
Published: (2024)

Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
by: Meng, Lingwei, et al.
Published: (2024)

Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
by: Jiang, Yicong, et al.
Published: (2024)

Supporting SENCOTEN Language Documentation Efforts with Automatic Speech Recognition
by: Geng, Mengzhe, et al.
Published: (2025)

Exploring SSL Discrete Tokens for Multilingual ASR
by: Cui, Mingyu, et al.
Published: (2024)

Enhancing Speech Emotion Recognition with Multi-Task Learning and Dynamic Feature Fusion
by: Wang, Honghong, et al.
Published: (2025)

Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network
by: Chen, Yanan, et al.
Published: (2024)

Speech Emotion Recognition with ASR Integration
by: Li, Yuanchao
Published: (2026)

Bayesian Learning for Deep Neural Network Adaptation
by: Xie, Xurong, et al.
Published: (2020)

Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation
by: Melhem, Rawad, et al.
Published: (2023)

Advanced Long-Content Speech Recognition With Factorized Neural Transducer
by: Gong, Xun, et al.
Published: (2024)

A Comprehensive Study on the Effectiveness of ASR Representations for Noise-Robust Speech Emotion Recognition
by: Shi, Xiaohan, et al.
Published: (2023)