Saved in:
| Main Authors: | Chen, Yang, Wang, Hui, Wang, Shiyao, Chen, Junyang, He, Jiabei, Zhou, Jiaming, Yang, Xi, Wang, Yequan, Lin, Yonghua, Qin, Yong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.16578 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5
by: Zhou, Jiaming, et al.
Published: (2024)
by: Zhou, Jiaming, et al.
Published: (2024)
CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition
by: Zhou, Jiaming, et al.
Published: (2025)
by: Zhou, Jiaming, et al.
Published: (2025)
WildElder: A Chinese Elderly Speech Dataset from the Wild with Fine-Grained Manual Annotations
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition
by: Wang, Shiyao, et al.
Published: (2025)
by: Wang, Shiyao, et al.
Published: (2025)
PB-LRDWWS System for the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge
by: Wang, Shiyao, et al.
Published: (2024)
by: Wang, Shiyao, et al.
Published: (2024)
EchoVoices: Preserving Generational Voices and Memories for Seniors and Children
by: Xu, Haiying, et al.
Published: (2025)
by: Xu, Haiying, et al.
Published: (2025)
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
by: Zhou, Jiaming, et al.
Published: (2024)
by: Zhou, Jiaming, et al.
Published: (2024)
CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models
by: Chen, Junyang, et al.
Published: (2026)
by: Chen, Junyang, et al.
Published: (2026)
Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation
by: Wang, Shiyao, et al.
Published: (2024)
by: Wang, Shiyao, et al.
Published: (2024)
AudioEval: Automatic Dual-Perspective and Multi-Dimensional Evaluation of Text-to-Audio-Generation
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
TTA-Bench: A Comprehensive Benchmark for Evaluating Text-to-Audio Models
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval
by: Sun, Haoqin, et al.
Published: (2025)
by: Sun, Haoqin, et al.
Published: (2025)
RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization
by: Yang, Bing, et al.
Published: (2024)
by: Yang, Bing, et al.
Published: (2024)
MusicEval: A Generative Music Dataset with Expert Ratings for Automatic Text-to-Music Evaluation
by: Liu, Cheng, et al.
Published: (2025)
by: Liu, Cheng, et al.
Published: (2025)
Interpretable Audio Editing Evaluation via Chain-of-Thought Difference-Commonality Reasoning with Multimodal LLMs
by: Jia, Yuhang, et al.
Published: (2025)
by: Jia, Yuhang, et al.
Published: (2025)
kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
by: Zhou, Jiaming, et al.
Published: (2023)
by: Zhou, Jiaming, et al.
Published: (2023)
Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores
by: Zhou, Jiaming, et al.
Published: (2024)
by: Zhou, Jiaming, et al.
Published: (2024)
Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition
by: Sun, Haoqin, et al.
Published: (2024)
by: Sun, Haoqin, et al.
Published: (2024)
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
by: Liu, Sen, et al.
Published: (2024)
by: Liu, Sen, et al.
Published: (2024)
Cross-Talk Speech Reduction, by Separation, for Separation
by: Wang, Zhong-Qiu, et al.
Published: (2026)
by: Wang, Zhong-Qiu, et al.
Published: (2026)
StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2024)
by: Wang, Zhichao, et al.
Published: (2024)
Generating Novel and Realistic Speakers for Voice Conversion
by: Chen, Meiying Melissa, et al.
Published: (2025)
by: Chen, Meiying Melissa, et al.
Published: (2025)
AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework
by: Jia, Yuhang, et al.
Published: (2024)
by: Jia, Yuhang, et al.
Published: (2024)
DIFFA: Large Language Diffusion Models Can Listen and Understand
by: Zhou, Jiaming, et al.
Published: (2025)
by: Zhou, Jiaming, et al.
Published: (2025)
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2024)
by: Wang, Zhichao, et al.
Published: (2024)
Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment
by: Wang, Xuechen, et al.
Published: (2024)
by: Wang, Xuechen, et al.
Published: (2024)
Summary on The Multilingual Conversational Speech Language Model Challenge: Datasets, Tasks, Baselines, and Methods
by: Mu, Bingshen, et al.
Published: (2025)
by: Mu, Bingshen, et al.
Published: (2025)
ctPuLSE: Close-Talk, and Pseudo-Label Based Far-Field, Speech Enhancement
by: Wang, Zhong-Qiu
Published: (2024)
by: Wang, Zhong-Qiu
Published: (2024)
AFT: An Exemplar-Free Class Incremental Learning Method for Environmental Sound Classification
by: Chen, Xinyi, et al.
Published: (2025)
by: Chen, Xinyi, et al.
Published: (2025)
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
by: Guan, Wenhao, et al.
Published: (2024)
by: Guan, Wenhao, et al.
Published: (2024)
Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2023)
by: Wang, Zhichao, et al.
Published: (2023)
Combined Generative and Predictive Modeling for Speech Super-resolution
by: Wang, Heming, et al.
Published: (2024)
by: Wang, Heming, et al.
Published: (2024)
DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations
by: Peng, Ziqiao, et al.
Published: (2025)
by: Peng, Ziqiao, et al.
Published: (2025)
Uncertainty-Aware Mean Opinion Score Prediction
by: Wang, Hui, et al.
Published: (2024)
by: Wang, Hui, et al.
Published: (2024)
Residual Speaker Representation for One-Shot Voice Conversion
by: Xu, Le, et al.
Published: (2023)
by: Xu, Le, et al.
Published: (2023)
CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion
by: Li, Yuke, et al.
Published: (2024)
by: Li, Yuke, et al.
Published: (2024)
MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
by: Yang, Qian, et al.
Published: (2024)
by: Yang, Qian, et al.
Published: (2024)
ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
by: Zhu, Xinfa, et al.
Published: (2025)
by: Zhu, Xinfa, et al.
Published: (2025)
Cross-Talk Reduction
by: Wang, Zhong-Qiu, et al.
Published: (2024)
by: Wang, Zhong-Qiu, et al.
Published: (2024)
Similar Items
-
ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5
by: Zhou, Jiaming, et al.
Published: (2024) -
CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition
by: Zhou, Jiaming, et al.
Published: (2025) -
WildElder: A Chinese Elderly Speech Dataset from the Wild with Fine-Grained Manual Annotations
by: Wang, Hui, et al.
Published: (2025) -
A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition
by: Wang, Shiyao, et al.
Published: (2025) -
PB-LRDWWS System for the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge
by: Wang, Shiyao, et al.
Published: (2024)