:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gu, Bin, Guo, Wu, Dai, Lirong, Du, Jun
Format:	Preprint
Published:	2020
Subjects:	Audio and Speech Processing Signal Processing
Online Access:	https://arxiv.org/abs/2002.06049
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization
by: Chen, Yafeng, et al.
Published: (2024)

Advanced Signal Analysis in Detecting Replay Attacks for Automatic Speaker Verification Systems
by: Kuang, Lee Shih
Published: (2024)

ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency
by: Chen, Yafeng, et al.
Published: (2024)

Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios
by: Fiorio, Luan Vinícius, et al.
Published: (2025)

SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling
by: Sato, Hiroshi, et al.
Published: (2024)

Binaural Selective Attention Model for Target Speaker Extraction
by: Meng, Hanyu, et al.
Published: (2024)

Speakers Localization Using Batch EM In Unfolding Neural Network
by: Veler, Rina, et al.
Published: (2026)

Speaker and Style Disentanglement of Speech Based on Contrastive Predictive Coding Supported Factorized Variational Autoencoder
by: Xie, Yuying, et al.
Published: (2024)

Tracking of Intermittent and Moving Speakers : Dataset and Metrics
by: Iatariene, Taous, et al.
Published: (2025)

A Stage-Wise Learning Strategy with Fixed Anchors for Robust Speaker Verification
by: Gu, Bin, et al.
Published: (2025)

Robustness of Speech Separation Models for Similar-pitch Speakers
by: Lay, Bunlong, et al.
Published: (2024)

Comparison of Frequency-Fusion Mechanisms for Binaural Direction-of-Arrival Estimation for Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2024)

Solid State Bus-Comp: A Large-Scale and Diverse Dataset for Dynamic Range Compressor Virtual Analog Modeling
by: Gu, Yicheng, et al.
Published: (2025)

Exploiting an External Microphone for Binaural RTF-Vector-Based Direction of Arrival Estimation for Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2023)

Completing Sets of Prototype Transfer Functions for Subspace-based Direction of Arrival Estimation of Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2025)

TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations
by: Gao, Xiaoxue, et al.
Published: (2024)

Towards Low-Latency Tracking of Multiple Speakers With Short-Context Speaker Embeddings
by: Iatariene, Taous, et al.
Published: (2025)

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable
by: Xu, Yanze, et al.
Published: (2026)

Multi-channel Replay Speech Detection using an Adaptive Learnable Beamformer
by: Neri, Michael, et al.
Published: (2025)

SELM: Speech Enhancement Using Discrete Tokens and Language Models
by: Wang, Ziqian, et al.
Published: (2023)

Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation
by: Kim, Miseul, et al.
Published: (2025)

Zero-Bit Transmission of Adaptive Pre- and De-emphasis Filters for Speech and Audio Coding
by: Piralideh, Niloofar Omidi, et al.
Published: (2024)

Optimizing Domain-Adaptive Self-Supervised Learning for Clinical Voice-Based Disease Classification
by: Liu, Weixin, et al.
Published: (2026)

Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios
by: Jiang, Ya, et al.
Published: (2024)

Speech-preserving active noise control: a deep learning approach in reverberant environments
by: Dai, Shuning
Published: (2026)

Breaking Speaker Recognition with PaddingBack
by: Ye, Zhe, et al.
Published: (2023)

An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
by: Gu, Yicheng, et al.
Published: (2024)

A Neural Denoising Vocoder for Clean Waveform Generation from Noisy Mel-Spectrogram based on Amplitude and Phase Predictions
by: Du, Hui-Peng, et al.
Published: (2024)

SIRUP: A diffusion-based virtual upmixer of steering vectors for highly-directive spatialization with first-order ambisonics
by: Picard, Emilio, et al.
Published: (2026)

Self-Tuning Spectral Clustering for Speaker Diarization
by: Raghav, Nikhil, et al.
Published: (2024)

ParaS2S: Benchmarking and Aligning Spoken Language Models for Paralinguistic-aware Speech-to-Speech Interaction
by: Yang, Shu-wen, et al.
Published: (2025)

Aliasing-Free Neural Audio Synthesis
by: Gu, Yicheng, et al.
Published: (2025)

FUN-SSL: Full-band Layer Followed by U-Net with Narrow-band Layers for Multiple Moving Sound Source Localization
by: Choi, Yuseon, et al.
Published: (2025)

StreamVoiceAnon+: Emotion-Preserving Streaming Speaker Anonymization via Frame-Level Acoustic Distillation
by: Kuzmin, Nikita, et al.
Published: (2026)

SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional Masked Language Modeling
by: Yao, Shengshi, et al.
Published: (2025)

Automotive sound field reproduction using deep optimization with spatial domain constraint
by: Qian, Yufan, et al.
Published: (2025)

Binaural Localization Model for Speech in Noise
by: Tokala, Vikas, et al.
Published: (2025)

Adaptive Diagonal Loading using Krylov Subspaces for Robust Beamforming
by: Mittal, Manan, et al.
Published: (2026)

FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching
by: Wang, Ziqian, et al.
Published: (2025)

Dependence on Early and Late Reverberation of Single-Channel Speaker Distance Estimation
by: Neri, Michael, et al.
Published: (2026)