:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Guanlong, Wang, Yongqiang, Pelecanos, Jason, Zhang, Yu, Liao, Hank, Huang, Yiling, Lu, Han, Wang, Quan
Format:	Preprint
Published:	2023
Subjects:	Audio and Speech Processing Machine Learning Sound
Online Access:	https://arxiv.org/abs/2309.08023
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
by: Wang, Quan, et al.
Published: (2024)

Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering
by: Wang, Quan, et al.
Published: (2022)

USM RNN-T model weights binarization
by: Rybakov, Oleg, et al.
Published: (2024)

USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
by: Ding, Shaojin, et al.
Published: (2023)

Pretraining Multi-Speaker Identification for Neural Speaker Diarization
by: Horiguchi, Shota, et al.
Published: (2025)

SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
by: Li, Yue, et al.
Published: (2024)

Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
by: Zhang, Li, et al.
Published: (2024)

Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
by: Wang, Weiqing, et al.
Published: (2024)

USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
by: Li, Na, et al.
Published: (2025)

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
by: Chen, Zhengyang, et al.
Published: (2024)

Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
by: Huang, Wen, et al.
Published: (2024)

Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)

Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR
by: Wang, Weiqing, et al.
Published: (2025)

Streaming Sortformer: Speaker Cache-Based Online Speaker Diarization with Arrival-Time Ordering
by: Medennikov, Ivan, et al.
Published: (2025)

Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers
by: Quan, Changsheng, et al.
Published: (2024)

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
by: Fu, Ruibo, et al.
Published: (2024)

Utilizing Speaker Profiles for Impersonation Audio Detection
by: Gu, Hao, et al.
Published: (2024)

Face-Voice Association for Audiovisual Active Speaker Detection in Egocentric Recordings
by: Clarke, Jason, et al.
Published: (2025)

Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
by: Liao, Shijia, et al.
Published: (2024)

Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition
by: Hu, Shujie, et al.
Published: (2024)

An Investigation on Speaker Augmentation for End-to-End Speaker Extraction
by: You, Zhenghai, et al.
Published: (2025)

Probing the Feasibility of Multilingual Speaker Anonymization
by: Meyer, Sarina, et al.
Published: (2024)

Enhancing Stereo Sound Event Detection with BiMamba and Pretrained PSELDnet
by: Gao, Wenmiao, et al.
Published: (2025)

Can Audio Large Language Models Verify Speaker Identity?
by: Ren, Yiming, et al.
Published: (2025)

A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition
by: Zhou, Zhenyu, et al.
Published: (2024)

Speaker Contrastive Learning for Source Speaker Tracing
by: Wang, Qing, et al.
Published: (2024)

Profile-Error-Tolerant Target-Speaker Voice Activity Detection
by: Wang, Dongmei, et al.
Published: (2023)

Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
by: Ku, Pin-Jui, et al.
Published: (2024)

Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization
by: Thienpondt, Jenthe, et al.
Published: (2024)

oboVox Far Field Speaker Recognition: A Novel Data Augmentation Approach with Pretrained Models
by: Dip, Muhammad Sudipto Siam, et al.
Published: (2024)

On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition
by: HU, Shujie, et al.
Published: (2025)

TidyVoice: A Curated Multilingual Dataset for Speaker Verification Derived from Common Voice
by: Farhadipour, Aref, et al.
Published: (2026)

NeuralMultiling: A Novel Neural Architecture Search for Smartphone based Multilingual Speaker Verification
by: PN, Aravinda Reddy, et al.
Published: (2024)

Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification
by: Liao, Yen-Lun, et al.
Published: (2022)

Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
by: Feng, Tiantian, et al.
Published: (2025)

Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection
by: Zeng, Bang, et al.
Published: (2025)

Mitigating Language Mismatch in SSL-Based Speaker Anonymization
by: Zhang, Zhe, et al.
Published: (2025)

Online Audio-Visual Autoregressive Speaker Extraction
by: Pan, Zexu, et al.
Published: (2025)

The Reasonable Effectiveness of Speaker Embeddings for Violence Detection
by: Jain, Sarthak, et al.
Published: (2024)

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
by: Wang, Shuai, et al.
Published: (2024)