:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jiang, Yidi, Chen, Zhengyang, Tao, Ruijie, Deng, Liqun, Qian, Yanmin, Li, Haizhou
Format:	Preprint
Published:	2023
Subjects:	Audio and Speech Processing Signal Processing
Online Access:	https://arxiv.org/abs/2310.14823
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Target Speech Diarization with Multimodal Prompts
by: Jiang, Yidi, et al.
Published: (2024)

USED: Universal Speaker Extraction and Diarization
by: Ao, Junyi, et al.
Published: (2023)

Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization
by: Tao, Ruijie, et al.
Published: (2024)

Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
by: Tao, Ruijie, et al.
Published: (2024)

3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization
by: Chen, Yafeng, et al.
Published: (2024)

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
by: Chen, Zhengyang, et al.
Published: (2024)

Toward Universal Speech Enhancement for Diverse Input Conditions
by: Zhang, Wangyou, et al.
Published: (2023)

BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM
by: Gong, Xun, et al.
Published: (2025)

AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis
by: Qi, Tianhua, et al.
Published: (2026)

Voice Conversion Augmentation for Speaker Recognition on Defective Datasets
by: Tao, Ruijie, et al.
Published: (2024)

Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation
by: Kim, Miseul, et al.
Published: (2025)

Lessons Learned from the URGENT 2024 Speech Enhancement Challenge
by: Zhang, Wangyou, et al.
Published: (2025)

Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
by: Chen, Zhengyang, et al.
Published: (2024)

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
by: Wang, Shuai, et al.
Published: (2024)

Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction
by: Zhang, Leying, et al.
Published: (2025)

Adaptive Per-Channel Energy Normalization Front-end for Robust Audio Signal Processing
by: Meng, Hanyu, et al.
Published: (2025)

An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
by: Gu, Yicheng, et al.
Published: (2024)

SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech
by: Lin, Jingru, et al.
Published: (2024)

Transferable Adversarial Attacks against ASR
by: Gao, Xiaoxue, et al.
Published: (2024)

SpeechMLC: Speech Multi-label Classification
by: Kim, Miseul, et al.
Published: (2025)

Self-Tuning Spectral Clustering for Speaker Diarization
by: Raghav, Nikhil, et al.
Published: (2024)

SELM: Speech Enhancement Using Discrete Tokens and Language Models
by: Wang, Ziqian, et al.
Published: (2023)

TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations
by: Gao, Xiaoxue, et al.
Published: (2024)

PLDNet: PLD-Guided Lightweight Deep Network Boosted by Efficient Attention for Handheld Dual-Microphone Speech Enhancement
by: Zhou, Nan, et al.
Published: (2024)

Unified Audio Event Detection
by: Jiang, Yidi, et al.
Published: (2024)

The Overview of Segmental Durations Modification Algorithms on Speech Signal Characteristics
by: Jang, Kyeomeun, et al.
Published: (2025)

Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition
by: Wang, Kuan-Chen, et al.
Published: (2024)

A Speech Production Model for Radar: Connecting Speech Acoustics with Radar-Measured Vibrations
by: Lenz, Isabella, et al.
Published: (2025)

ParaS2S: Benchmarking and Aligning Spoken Language Models for Paralinguistic-aware Speech-to-Speech Interaction
by: Yang, Shu-wen, et al.
Published: (2025)

Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec
by: Ren, Yanzhou, et al.
Published: (2026)

Binaural Localization Model for Speech in Noise
by: Tokala, Vikas, et al.
Published: (2025)

Speech-Based Prioritization for Schizophrenia Intervention
by: Premananth, Gowtham, et al.
Published: (2025)

Robust Detection of Underwater Target Against Non-Uniform Noise With Optical Fiber DAS Array
by: Cang, Siyuan, et al.
Published: (2025)

FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching
by: Wang, Ziqian, et al.
Published: (2025)

Brain-Informed Speech Separation for Cochlear Implants
by: Gajecki, Tom, et al.
Published: (2026)

Speech Enhancement based on cascaded two flows
by: Lee, Seonggyu, et al.
Published: (2025)

Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions
by: Gao, Xiaoxue, et al.
Published: (2025)

Speech Watermarking with Discrete Intermediate Representations
by: Ji, Shengpeng, et al.
Published: (2024)

Semantic Communications for Speech Recognition
by: Weng, Zhenzi, et al.
Published: (2021)

Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain
by: Wang, Pengyu, et al.
Published: (2025)