:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Miao, Xiaoxiao, Tao, Ruijie, Zeng, Chang, Wang, Xin
Format:	Preprint
Published:	2024
Subjects:	Sound Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2407.05608
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The Third VoicePrivacy Challenge: Preserving Emotional Expressiveness and Linguistic Content in Voice Anonymization
by: Tomashenko, Natalia, et al.
Published: (2026)

SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
by: Grossman, Raymond, et al.
Published: (2025)

Mitigating Language Mismatch in SSL-Based Speaker Anonymization
by: Zhang, Zhe, et al.
Published: (2025)

A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge
by: Wang, Xiaopeng, et al.
Published: (2024)

An approach to optimize inference of the DIART speaker diarization pipeline
by: Aperdannier, Roman, et al.
Published: (2024)

Extending Whisper with prompt tuning to target-speaker ASR
by: Ma, Hao, et al.
Published: (2023)

SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization
by: Tang, Beilong, et al.
Published: (2025)

VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka
by: Chen, Li-Wei, et al.
Published: (2024)

Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation
by: Miao, Xiaoxiao, et al.
Published: (2024)

Probing the Feasibility of Multilingual Speaker Anonymization
by: Meyer, Sarina, et al.
Published: (2024)

Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches
by: Zeng, Chang, et al.
Published: (2024)

You don't understand me!: Comparing ASR results for L1 and L2 speakers of Swedish
by: Cumbal, Ronald, et al.
Published: (2024)

Multi-speaker Text-to-speech Training with Speaker Anonymized Data
by: Huang, Wen-Chin, et al.
Published: (2024)

Analysis of Speech Temporal Dynamics in the Context of Speaker Verification and Voice Anonymization
by: Tomashenko, Natalia, et al.
Published: (2024)

On the Impact of Voice Anonymization on Speech Diagnostic Applications: a Case Study on COVID-19 Detection
by: Zhu, Yi, et al.
Published: (2023)

MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark
by: Wang, Dingdong, et al.
Published: (2025)

A Multi-Probe Audit of Clinical-Interview Depression Detection Benchmarks
by: Ishikawa, Takehiro, et al.
Published: (2026)

SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection
by: Yi, Jiangyan, et al.
Published: (2022)

S2SBench: A Benchmark for Quantifying Intelligence Degradation in Speech-to-Speech Large Language Models
by: Fang, Yuanbo, et al.
Published: (2025)

WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables
by: Lin, Zhaojiang, et al.
Published: (2025)

WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning
by: Mundada, Gagan, et al.
Published: (2025)

InstructSing: High-Fidelity Singing Voice Generation via Instructing Yourself
by: Zeng, Chang, et al.
Published: (2024)

ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
by: Wang, He, et al.
Published: (2025)

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
by: Shi, Jiatong, et al.
Published: (2023)

Text adaptation for speaker verification with speaker-text factorized embeddings
by: Yang, Yexin, et al.
Published: (2025)

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
by: Shi, Jiatong, et al.
Published: (2024)

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
by: Ao, Junyi, et al.
Published: (2024)

Hierarchical speaker representation for target speaker extraction
by: He, Shulin, et al.
Published: (2022)

WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
by: Luo, Tianze, et al.
Published: (2025)

Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction
by: Ko, Yuka, et al.
Published: (2024)

SecureSpeech: Prompt-based Speaker and Content Protection
by: Hui, Belinda Soh Hui, et al.
Published: (2025)

CMDAR: A Chinese Multi-scene Dynamic Audio Reasoning Benchmark with Diverse Challenges
by: Li, Hui, et al.
Published: (2025)

AudioBench: A Universal Benchmark for Audio Large Language Models
by: Wang, Bin, et al.
Published: (2024)

Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition
by: Zhao, Ruoyu, et al.
Published: (2025)

ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction
by: Wei, Victor Junqiu, et al.
Published: (2024)

Vedavani: A Benchmark Corpus for ASR on Vedic Sanskrit Poetry
by: Kumar, Sujeet, et al.
Published: (2025)

Multilingual Source Tracing of Speech Deepfakes: A First Benchmark
by: Xuan, Xi, et al.
Published: (2025)

Improving curriculum learning for target speaker extraction with synthetic speakers
by: Liu, Yun, et al.
Published: (2024)

BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition
by: Tuttösí, Paige, et al.
Published: (2025)

STAB: Speech Tokenizer Assessment Benchmark
by: Vashishth, Shikhar, et al.
Published: (2024)