Saved in:
| Main Authors: | Jiang, Yidi, Chen, Zhengyang, Tao, Ruijie, Deng, Liqun, Qian, Yanmin, Li, Haizhou |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2310.14823 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Target Speech Diarization with Multimodal Prompts
by: Jiang, Yidi, et al.
Published: (2024)
by: Jiang, Yidi, et al.
Published: (2024)
USED: Universal Speaker Extraction and Diarization
by: Ao, Junyi, et al.
Published: (2023)
by: Ao, Junyi, et al.
Published: (2023)
Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization
by: Tao, Ruijie, et al.
Published: (2024)
by: Tao, Ruijie, et al.
Published: (2024)
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
by: Tao, Ruijie, et al.
Published: (2024)
by: Tao, Ruijie, et al.
Published: (2024)
3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization
by: Chen, Yafeng, et al.
Published: (2024)
by: Chen, Yafeng, et al.
Published: (2024)
Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
Toward Universal Speech Enhancement for Diverse Input Conditions
by: Zhang, Wangyou, et al.
Published: (2023)
by: Zhang, Wangyou, et al.
Published: (2023)
BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM
by: Gong, Xun, et al.
Published: (2025)
by: Gong, Xun, et al.
Published: (2025)
AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis
by: Qi, Tianhua, et al.
Published: (2026)
by: Qi, Tianhua, et al.
Published: (2026)
Voice Conversion Augmentation for Speaker Recognition on Defective Datasets
by: Tao, Ruijie, et al.
Published: (2024)
by: Tao, Ruijie, et al.
Published: (2024)
Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation
by: Kim, Miseul, et al.
Published: (2025)
by: Kim, Miseul, et al.
Published: (2025)
Lessons Learned from the URGENT 2024 Speech Enhancement Challenge
by: Zhang, Wangyou, et al.
Published: (2025)
by: Zhang, Wangyou, et al.
Published: (2025)
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
by: Wang, Shuai, et al.
Published: (2024)
by: Wang, Shuai, et al.
Published: (2024)
Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction
by: Zhang, Leying, et al.
Published: (2025)
by: Zhang, Leying, et al.
Published: (2025)
Adaptive Per-Channel Energy Normalization Front-end for Robust Audio Signal Processing
by: Meng, Hanyu, et al.
Published: (2025)
by: Meng, Hanyu, et al.
Published: (2025)
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
by: Gu, Yicheng, et al.
Published: (2024)
by: Gu, Yicheng, et al.
Published: (2024)
SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech
by: Lin, Jingru, et al.
Published: (2024)
by: Lin, Jingru, et al.
Published: (2024)
Transferable Adversarial Attacks against ASR
by: Gao, Xiaoxue, et al.
Published: (2024)
by: Gao, Xiaoxue, et al.
Published: (2024)
SpeechMLC: Speech Multi-label Classification
by: Kim, Miseul, et al.
Published: (2025)
by: Kim, Miseul, et al.
Published: (2025)
Self-Tuning Spectral Clustering for Speaker Diarization
by: Raghav, Nikhil, et al.
Published: (2024)
by: Raghav, Nikhil, et al.
Published: (2024)
SELM: Speech Enhancement Using Discrete Tokens and Language Models
by: Wang, Ziqian, et al.
Published: (2023)
by: Wang, Ziqian, et al.
Published: (2023)
TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations
by: Gao, Xiaoxue, et al.
Published: (2024)
by: Gao, Xiaoxue, et al.
Published: (2024)
PLDNet: PLD-Guided Lightweight Deep Network Boosted by Efficient Attention for Handheld Dual-Microphone Speech Enhancement
by: Zhou, Nan, et al.
Published: (2024)
by: Zhou, Nan, et al.
Published: (2024)
Unified Audio Event Detection
by: Jiang, Yidi, et al.
Published: (2024)
by: Jiang, Yidi, et al.
Published: (2024)
The Overview of Segmental Durations Modification Algorithms on Speech Signal Characteristics
by: Jang, Kyeomeun, et al.
Published: (2025)
by: Jang, Kyeomeun, et al.
Published: (2025)
Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition
by: Wang, Kuan-Chen, et al.
Published: (2024)
by: Wang, Kuan-Chen, et al.
Published: (2024)
A Speech Production Model for Radar: Connecting Speech Acoustics with Radar-Measured Vibrations
by: Lenz, Isabella, et al.
Published: (2025)
by: Lenz, Isabella, et al.
Published: (2025)
ParaS2S: Benchmarking and Aligning Spoken Language Models for Paralinguistic-aware Speech-to-Speech Interaction
by: Yang, Shu-wen, et al.
Published: (2025)
by: Yang, Shu-wen, et al.
Published: (2025)
Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec
by: Ren, Yanzhou, et al.
Published: (2026)
by: Ren, Yanzhou, et al.
Published: (2026)
Binaural Localization Model for Speech in Noise
by: Tokala, Vikas, et al.
Published: (2025)
by: Tokala, Vikas, et al.
Published: (2025)
Speech-Based Prioritization for Schizophrenia Intervention
by: Premananth, Gowtham, et al.
Published: (2025)
by: Premananth, Gowtham, et al.
Published: (2025)
Robust Detection of Underwater Target Against Non-Uniform Noise With Optical Fiber DAS Array
by: Cang, Siyuan, et al.
Published: (2025)
by: Cang, Siyuan, et al.
Published: (2025)
FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching
by: Wang, Ziqian, et al.
Published: (2025)
by: Wang, Ziqian, et al.
Published: (2025)
Brain-Informed Speech Separation for Cochlear Implants
by: Gajecki, Tom, et al.
Published: (2026)
by: Gajecki, Tom, et al.
Published: (2026)
Speech Enhancement based on cascaded two flows
by: Lee, Seonggyu, et al.
Published: (2025)
by: Lee, Seonggyu, et al.
Published: (2025)
Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions
by: Gao, Xiaoxue, et al.
Published: (2025)
by: Gao, Xiaoxue, et al.
Published: (2025)
Speech Watermarking with Discrete Intermediate Representations
by: Ji, Shengpeng, et al.
Published: (2024)
by: Ji, Shengpeng, et al.
Published: (2024)
Semantic Communications for Speech Recognition
by: Weng, Zhenzi, et al.
Published: (2021)
by: Weng, Zhenzi, et al.
Published: (2021)
Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain
by: Wang, Pengyu, et al.
Published: (2025)
by: Wang, Pengyu, et al.
Published: (2025)
Similar Items
-
Target Speech Diarization with Multimodal Prompts
by: Jiang, Yidi, et al.
Published: (2024) -
USED: Universal Speaker Extraction and Diarization
by: Ao, Junyi, et al.
Published: (2023) -
Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization
by: Tao, Ruijie, et al.
Published: (2024) -
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
by: Tao, Ruijie, et al.
Published: (2024) -
3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization
by: Chen, Yafeng, et al.
Published: (2024)