Saved in:
| Main Authors: | Zhang, Leying, Qian, Yao, Yu, Linfeng, Wang, Heming, Yang, Hemin, Zhou, Long, Liu, Shujie, Qian, Yanmin |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2309.13874 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction
by: Zhang, Leying, et al.
Published: (2025)
by: Zhang, Leying, et al.
Published: (2025)
Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
by: Zhang, Leying, et al.
Published: (2024)
by: Zhang, Leying, et al.
Published: (2024)
Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
by: Li, Chenda, et al.
Published: (2024)
by: Li, Chenda, et al.
Published: (2024)
DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice
by: Zhang, Leying, et al.
Published: (2026)
by: Zhang, Leying, et al.
Published: (2026)
SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation
by: Lu, Haitian, et al.
Published: (2025)
by: Lu, Haitian, et al.
Published: (2025)
JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions
by: Zhang, Leying, et al.
Published: (2026)
by: Zhang, Leying, et al.
Published: (2026)
Advanced Long-Content Speech Recognition With Factorized Neural Transducer
by: Gong, Xun, et al.
Published: (2024)
by: Gong, Xun, et al.
Published: (2024)
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
by: Zhang, Leying, et al.
Published: (2024)
by: Zhang, Leying, et al.
Published: (2024)
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
by: Le, Chenyang, et al.
Published: (2024)
by: Le, Chenyang, et al.
Published: (2024)
From Sharpness to Better Generalization for Speech Deepfake Detection
by: Huang, Wen, et al.
Published: (2025)
by: Huang, Wen, et al.
Published: (2025)
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
by: Wang, Shuai, et al.
Published: (2024)
by: Wang, Shuai, et al.
Published: (2024)
Improving Design of Input Condition Invariant Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)
by: Zhang, Wangyou, et al.
Published: (2024)
SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
by: Huang, Wen, et al.
Published: (2025)
by: Huang, Wen, et al.
Published: (2025)
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
A Unified Neural Codec Language Model for Selective Editable Text to Speech Generation
by: Pei, Hanchen, et al.
Published: (2026)
by: Pei, Hanchen, et al.
Published: (2026)
Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
by: Li, Jiaqi, et al.
Published: (2024)
by: Li, Jiaqi, et al.
Published: (2024)
Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization
by: Liu, Bei, et al.
Published: (2024)
by: Liu, Bei, et al.
Published: (2024)
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
by: Zhang, Leying, et al.
Published: (2025)
by: Zhang, Leying, et al.
Published: (2025)
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
by: Chen, Sanyuan, et al.
Published: (2024)
by: Chen, Sanyuan, et al.
Published: (2024)
Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
USED: Universal Speaker Extraction and Diarization
by: Ao, Junyi, et al.
Published: (2023)
by: Ao, Junyi, et al.
Published: (2023)
Discriminative-Generative Target Speaker Extraction with Decoder-Only Language Models
by: Zeng, Bang, et al.
Published: (2026)
by: Zeng, Bang, et al.
Published: (2026)
Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment
by: Wang, Wei, et al.
Published: (2025)
by: Wang, Wei, et al.
Published: (2025)
Boosting Large Language Model for Speech Synthesis: An Empirical Study
by: Hao, Hongkun, et al.
Published: (2023)
by: Hao, Hongkun, et al.
Published: (2023)
Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
by: Liu, Bei, et al.
Published: (2024)
by: Liu, Bei, et al.
Published: (2024)
Text adaptation for speaker verification with speaker-text factorized embeddings
by: Yang, Yexin, et al.
Published: (2025)
by: Yang, Yexin, et al.
Published: (2025)
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)
by: Zhang, Wangyou, et al.
Published: (2024)
Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings
by: Zhao, He, et al.
Published: (2024)
by: Zhao, He, et al.
Published: (2024)
Toward Universal Speech Enhancement for Diverse Input Conditions
by: Zhang, Wangyou, et al.
Published: (2023)
by: Zhang, Wangyou, et al.
Published: (2023)
BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM
by: Gong, Xun, et al.
Published: (2025)
by: Gong, Xun, et al.
Published: (2025)
LongCat-AudioDiT: High-Fidelity Diffusion Text-to-Speech in the Waveform Latent Space
by: Xin, Detai, et al.
Published: (2026)
by: Xin, Detai, et al.
Published: (2026)
Probing Self-supervised Learning Models with Target Speech Extraction
by: Peng, Junyi, et al.
Published: (2024)
by: Peng, Junyi, et al.
Published: (2024)
DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition
by: Shao, Hang, et al.
Published: (2023)
by: Shao, Hang, et al.
Published: (2023)
Combined Generative and Predictive Modeling for Speech Super-resolution
by: Wang, Heming, et al.
Published: (2024)
by: Wang, Heming, et al.
Published: (2024)
SpatialCodec: Neural Spatial Speech Coding
by: Xu, Zhongweiyang, et al.
Published: (2023)
by: Xu, Zhongweiyang, et al.
Published: (2023)
Beyond Speaker Identity: Text Guided Target Speech Extraction
by: Huo, Mingyue, et al.
Published: (2025)
by: Huo, Mingyue, et al.
Published: (2025)
Target Speech Extraction with Pre-trained Self-supervised Learning Models
by: Peng, Junyi, et al.
Published: (2024)
by: Peng, Junyi, et al.
Published: (2024)
Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
Enroll-on-Wakeup: A First Comparative Study of Target Speech Extraction for Seamless Interaction in Real Noisy Human-Machine Dialogue Scenarios
by: Yang, Yiming, et al.
Published: (2026)
by: Yang, Yiming, et al.
Published: (2026)
DENSE: Dynamic Embedding Causal Target Speech Extraction
by: Wang, Yiwen, et al.
Published: (2024)
by: Wang, Yiwen, et al.
Published: (2024)
Similar Items
-
Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction
by: Zhang, Leying, et al.
Published: (2025) -
Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
by: Zhang, Leying, et al.
Published: (2024) -
Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
by: Li, Chenda, et al.
Published: (2024) -
DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice
by: Zhang, Leying, et al.
Published: (2026) -
SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation
by: Lu, Haitian, et al.
Published: (2025)