Saved in:
| Main Authors: | Chen, Youjun, Li, Guinan, Geng, Mengzhe, Xie, Xurong, Hu, Shujie, Wang, Huimeng, Xu, Haoning, Deng, Chengxi, Deng, Jiajun, Li, Zhaoqing, Cui, Mingyu, Liu, Xunying |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.18802 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition
by: Chen, Youjun, et al.
Published: (2025)
by: Chen, Youjun, et al.
Published: (2025)
Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition
by: Wang, Huimeng, et al.
Published: (2025)
by: Wang, Huimeng, et al.
Published: (2025)
On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition
by: HU, Shujie, et al.
Published: (2025)
by: HU, Shujie, et al.
Published: (2025)
Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates
by: Xu, Haoning, et al.
Published: (2025)
by: Xu, Haoning, et al.
Published: (2025)
Effective and Efficient Mixed Precision Quantization of Speech Foundation Models
by: Xu, Haoning, et al.
Published: (2025)
by: Xu, Haoning, et al.
Published: (2025)
Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition
by: Hu, Shujie, et al.
Published: (2024)
by: Hu, Shujie, et al.
Published: (2024)
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
by: Li, Guinan, et al.
Published: (2024)
by: Li, Guinan, et al.
Published: (2024)
Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation
by: Wang, Huimeng, et al.
Published: (2024)
by: Wang, Huimeng, et al.
Published: (2024)
MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition
by: Deng, Chengxi, et al.
Published: (2025)
by: Deng, Chengxi, et al.
Published: (2025)
Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition
by: Hu, Shujie, et al.
Published: (2024)
by: Hu, Shujie, et al.
Published: (2024)
Regularized Federated Learning for Privacy-Preserving Dysarthric and Elderly Speech Recognition
by: Zhong, Tao, et al.
Published: (2025)
by: Zhong, Tao, et al.
Published: (2025)
One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
by: Li, Zhaoqing, et al.
Published: (2024)
by: Li, Zhaoqing, et al.
Published: (2024)
Exploring Cross-Utterance Speech Contexts for Conformer-Transducer Speech Recognition Systems
by: Cui, Mingyu, et al.
Published: (2025)
by: Cui, Mingyu, et al.
Published: (2025)
Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask
by: Wang, Tianzi, et al.
Published: (2024)
by: Wang, Tianzi, et al.
Published: (2024)
Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation
by: Geng, Mengzhe, et al.
Published: (2024)
by: Geng, Mengzhe, et al.
Published: (2024)
Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models
by: Li, Zhaoqing, et al.
Published: (2025)
by: Li, Zhaoqing, et al.
Published: (2025)
Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision
by: Li, Zhaoqing, et al.
Published: (2025)
by: Li, Zhaoqing, et al.
Published: (2025)
Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
by: Cui, Mingyu, et al.
Published: (2024)
by: Cui, Mingyu, et al.
Published: (2024)
Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition
by: Jin, Zengrui, et al.
Published: (2022)
by: Jin, Zengrui, et al.
Published: (2022)
Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition
by: Xie, Xurong, et al.
Published: (2022)
by: Xie, Xurong, et al.
Published: (2022)
Cocktail-Party Audio-Visual Speech Recognition
by: Nguyen, Thai-Binh, et al.
Published: (2025)
by: Nguyen, Thai-Binh, et al.
Published: (2025)
Investigation of Deep Neural Network Acoustic Modelling Approaches for Low Resource Accented Mandarin Speech Recognition
by: Xie, Xurong, et al.
Published: (2022)
by: Xie, Xurong, et al.
Published: (2022)
Towards Effective and Efficient Non-autoregressive decoders for Conformer and LLM-based ASR using Block-based Attention Mask
by: Wang, Tianzi, et al.
Published: (2025)
by: Wang, Tianzi, et al.
Published: (2025)
SemaVoice: Semantic-Aware Continuous Autoregressive Speech Synthesis
by: Wang, Huimeng, et al.
Published: (2026)
by: Wang, Huimeng, et al.
Published: (2026)
EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning
by: Wang, Dingdong, et al.
Published: (2026)
by: Wang, Dingdong, et al.
Published: (2026)
UNISON: A Unified Sound Generation and Editing Framework via Deep LLM Fusion
by: Li, Zhaoqing, et al.
Published: (2026)
by: Li, Zhaoqing, et al.
Published: (2026)
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
by: Rahimi, Akam, et al.
Published: (2025)
by: Rahimi, Akam, et al.
Published: (2025)
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)
by: Kang, Jiawen, et al.
Published: (2024)
Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
by: Kang, Jiawen, et al.
Published: (2024)
by: Kang, Jiawen, et al.
Published: (2024)
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
by: Meng, Lingwei, et al.
Published: (2024)
by: Meng, Lingwei, et al.
Published: (2024)
Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
by: Jiang, Yicong, et al.
Published: (2024)
by: Jiang, Yicong, et al.
Published: (2024)
Supporting SENCOTEN Language Documentation Efforts with Automatic Speech Recognition
by: Geng, Mengzhe, et al.
Published: (2025)
by: Geng, Mengzhe, et al.
Published: (2025)
Exploring SSL Discrete Tokens for Multilingual ASR
by: Cui, Mingyu, et al.
Published: (2024)
by: Cui, Mingyu, et al.
Published: (2024)
Enhancing Speech Emotion Recognition with Multi-Task Learning and Dynamic Feature Fusion
by: Wang, Honghong, et al.
Published: (2025)
by: Wang, Honghong, et al.
Published: (2025)
Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network
by: Chen, Yanan, et al.
Published: (2024)
by: Chen, Yanan, et al.
Published: (2024)
Speech Emotion Recognition with ASR Integration
by: Li, Yuanchao
Published: (2026)
by: Li, Yuanchao
Published: (2026)
Bayesian Learning for Deep Neural Network Adaptation
by: Xie, Xurong, et al.
Published: (2020)
by: Xie, Xurong, et al.
Published: (2020)
Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation
by: Melhem, Rawad, et al.
Published: (2023)
by: Melhem, Rawad, et al.
Published: (2023)
Advanced Long-Content Speech Recognition With Factorized Neural Transducer
by: Gong, Xun, et al.
Published: (2024)
by: Gong, Xun, et al.
Published: (2024)
A Comprehensive Study on the Effectiveness of ASR Representations for Noise-Robust Speech Emotion Recognition
by: Shi, Xiaohan, et al.
Published: (2023)
by: Shi, Xiaohan, et al.
Published: (2023)
Similar Items
-
Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition
by: Chen, Youjun, et al.
Published: (2025) -
Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition
by: Wang, Huimeng, et al.
Published: (2025) -
On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition
by: HU, Shujie, et al.
Published: (2025) -
Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates
by: Xu, Haoning, et al.
Published: (2025) -
Effective and Efficient Mixed Precision Quantization of Speech Foundation Models
by: Xu, Haoning, et al.
Published: (2025)