Saved in:
| Main Authors: | Zhao, Guanlong, Wang, Yongqiang, Pelecanos, Jason, Zhang, Yu, Liao, Hank, Huang, Yiling, Lu, Han, Wang, Quan |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2309.08023 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
by: Wang, Quan, et al.
Published: (2024)
by: Wang, Quan, et al.
Published: (2024)
Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering
by: Wang, Quan, et al.
Published: (2022)
by: Wang, Quan, et al.
Published: (2022)
USM RNN-T model weights binarization
by: Rybakov, Oleg, et al.
Published: (2024)
by: Rybakov, Oleg, et al.
Published: (2024)
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
by: Ding, Shaojin, et al.
Published: (2023)
by: Ding, Shaojin, et al.
Published: (2023)
Pretraining Multi-Speaker Identification for Neural Speaker Diarization
by: Horiguchi, Shota, et al.
Published: (2025)
by: Horiguchi, Shota, et al.
Published: (2025)
SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
by: Li, Yue, et al.
Published: (2024)
by: Li, Yue, et al.
Published: (2024)
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
by: Zhang, Li, et al.
Published: (2024)
by: Zhang, Li, et al.
Published: (2024)
Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
by: Wang, Weiqing, et al.
Published: (2024)
by: Wang, Weiqing, et al.
Published: (2024)
USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
by: Li, Na, et al.
Published: (2025)
by: Li, Na, et al.
Published: (2025)
Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
by: Huang, Wen, et al.
Published: (2024)
by: Huang, Wen, et al.
Published: (2024)
Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)
by: Zhang, Ke, et al.
Published: (2024)
Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR
by: Wang, Weiqing, et al.
Published: (2025)
by: Wang, Weiqing, et al.
Published: (2025)
Streaming Sortformer: Speaker Cache-Based Online Speaker Diarization with Arrival-Time Ordering
by: Medennikov, Ivan, et al.
Published: (2025)
by: Medennikov, Ivan, et al.
Published: (2025)
Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers
by: Quan, Changsheng, et al.
Published: (2024)
by: Quan, Changsheng, et al.
Published: (2024)
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
by: Fu, Ruibo, et al.
Published: (2024)
by: Fu, Ruibo, et al.
Published: (2024)
Utilizing Speaker Profiles for Impersonation Audio Detection
by: Gu, Hao, et al.
Published: (2024)
by: Gu, Hao, et al.
Published: (2024)
Face-Voice Association for Audiovisual Active Speaker Detection in Egocentric Recordings
by: Clarke, Jason, et al.
Published: (2025)
by: Clarke, Jason, et al.
Published: (2025)
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
by: Liao, Shijia, et al.
Published: (2024)
by: Liao, Shijia, et al.
Published: (2024)
Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition
by: Hu, Shujie, et al.
Published: (2024)
by: Hu, Shujie, et al.
Published: (2024)
An Investigation on Speaker Augmentation for End-to-End Speaker Extraction
by: You, Zhenghai, et al.
Published: (2025)
by: You, Zhenghai, et al.
Published: (2025)
Probing the Feasibility of Multilingual Speaker Anonymization
by: Meyer, Sarina, et al.
Published: (2024)
by: Meyer, Sarina, et al.
Published: (2024)
Enhancing Stereo Sound Event Detection with BiMamba and Pretrained PSELDnet
by: Gao, Wenmiao, et al.
Published: (2025)
by: Gao, Wenmiao, et al.
Published: (2025)
Can Audio Large Language Models Verify Speaker Identity?
by: Ren, Yiming, et al.
Published: (2025)
by: Ren, Yiming, et al.
Published: (2025)
A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition
by: Zhou, Zhenyu, et al.
Published: (2024)
by: Zhou, Zhenyu, et al.
Published: (2024)
Speaker Contrastive Learning for Source Speaker Tracing
by: Wang, Qing, et al.
Published: (2024)
by: Wang, Qing, et al.
Published: (2024)
Profile-Error-Tolerant Target-Speaker Voice Activity Detection
by: Wang, Dongmei, et al.
Published: (2023)
by: Wang, Dongmei, et al.
Published: (2023)
Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
by: Ku, Pin-Jui, et al.
Published: (2024)
by: Ku, Pin-Jui, et al.
Published: (2024)
Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization
by: Thienpondt, Jenthe, et al.
Published: (2024)
by: Thienpondt, Jenthe, et al.
Published: (2024)
oboVox Far Field Speaker Recognition: A Novel Data Augmentation Approach with Pretrained Models
by: Dip, Muhammad Sudipto Siam, et al.
Published: (2024)
by: Dip, Muhammad Sudipto Siam, et al.
Published: (2024)
On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition
by: HU, Shujie, et al.
Published: (2025)
by: HU, Shujie, et al.
Published: (2025)
TidyVoice: A Curated Multilingual Dataset for Speaker Verification Derived from Common Voice
by: Farhadipour, Aref, et al.
Published: (2026)
by: Farhadipour, Aref, et al.
Published: (2026)
NeuralMultiling: A Novel Neural Architecture Search for Smartphone based Multilingual Speaker Verification
by: PN, Aravinda Reddy, et al.
Published: (2024)
by: PN, Aravinda Reddy, et al.
Published: (2024)
Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification
by: Liao, Yen-Lun, et al.
Published: (2022)
by: Liao, Yen-Lun, et al.
Published: (2022)
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
by: Feng, Tiantian, et al.
Published: (2025)
by: Feng, Tiantian, et al.
Published: (2025)
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection
by: Zeng, Bang, et al.
Published: (2025)
by: Zeng, Bang, et al.
Published: (2025)
Mitigating Language Mismatch in SSL-Based Speaker Anonymization
by: Zhang, Zhe, et al.
Published: (2025)
by: Zhang, Zhe, et al.
Published: (2025)
Online Audio-Visual Autoregressive Speaker Extraction
by: Pan, Zexu, et al.
Published: (2025)
by: Pan, Zexu, et al.
Published: (2025)
The Reasonable Effectiveness of Speaker Embeddings for Violence Detection
by: Jain, Sarthak, et al.
Published: (2024)
by: Jain, Sarthak, et al.
Published: (2024)
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
by: Wang, Shuai, et al.
Published: (2024)
by: Wang, Shuai, et al.
Published: (2024)
Similar Items
-
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
by: Wang, Quan, et al.
Published: (2024) -
Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering
by: Wang, Quan, et al.
Published: (2022) -
USM RNN-T model weights binarization
by: Rybakov, Oleg, et al.
Published: (2024) -
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
by: Ding, Shaojin, et al.
Published: (2023) -
Pretraining Multi-Speaker Identification for Neural Speaker Diarization
by: Horiguchi, Shota, et al.
Published: (2025)