Saved in:
| Main Authors: | Miao, Xiaoxiao, Tao, Ruijie, Zeng, Chang, Wang, Xin |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.05608 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The Third VoicePrivacy Challenge: Preserving Emotional Expressiveness and Linguistic Content in Voice Anonymization
by: Tomashenko, Natalia, et al.
Published: (2026)
by: Tomashenko, Natalia, et al.
Published: (2026)
SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
by: Grossman, Raymond, et al.
Published: (2025)
by: Grossman, Raymond, et al.
Published: (2025)
Mitigating Language Mismatch in SSL-Based Speaker Anonymization
by: Zhang, Zhe, et al.
Published: (2025)
by: Zhang, Zhe, et al.
Published: (2025)
A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge
by: Wang, Xiaopeng, et al.
Published: (2024)
by: Wang, Xiaopeng, et al.
Published: (2024)
An approach to optimize inference of the DIART speaker diarization pipeline
by: Aperdannier, Roman, et al.
Published: (2024)
by: Aperdannier, Roman, et al.
Published: (2024)
Extending Whisper with prompt tuning to target-speaker ASR
by: Ma, Hao, et al.
Published: (2023)
by: Ma, Hao, et al.
Published: (2023)
SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization
by: Tang, Beilong, et al.
Published: (2025)
by: Tang, Beilong, et al.
Published: (2025)
VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka
by: Chen, Li-Wei, et al.
Published: (2024)
by: Chen, Li-Wei, et al.
Published: (2024)
Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation
by: Miao, Xiaoxiao, et al.
Published: (2024)
by: Miao, Xiaoxiao, et al.
Published: (2024)
Probing the Feasibility of Multilingual Speaker Anonymization
by: Meyer, Sarina, et al.
Published: (2024)
by: Meyer, Sarina, et al.
Published: (2024)
Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches
by: Zeng, Chang, et al.
Published: (2024)
by: Zeng, Chang, et al.
Published: (2024)
You don't understand me!: Comparing ASR results for L1 and L2 speakers of Swedish
by: Cumbal, Ronald, et al.
Published: (2024)
by: Cumbal, Ronald, et al.
Published: (2024)
Multi-speaker Text-to-speech Training with Speaker Anonymized Data
by: Huang, Wen-Chin, et al.
Published: (2024)
by: Huang, Wen-Chin, et al.
Published: (2024)
Analysis of Speech Temporal Dynamics in the Context of Speaker Verification and Voice Anonymization
by: Tomashenko, Natalia, et al.
Published: (2024)
by: Tomashenko, Natalia, et al.
Published: (2024)
On the Impact of Voice Anonymization on Speech Diagnostic Applications: a Case Study on COVID-19 Detection
by: Zhu, Yi, et al.
Published: (2023)
by: Zhu, Yi, et al.
Published: (2023)
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark
by: Wang, Dingdong, et al.
Published: (2025)
by: Wang, Dingdong, et al.
Published: (2025)
A Multi-Probe Audit of Clinical-Interview Depression Detection Benchmarks
by: Ishikawa, Takehiro, et al.
Published: (2026)
by: Ishikawa, Takehiro, et al.
Published: (2026)
SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection
by: Yi, Jiangyan, et al.
Published: (2022)
by: Yi, Jiangyan, et al.
Published: (2022)
S2SBench: A Benchmark for Quantifying Intelligence Degradation in Speech-to-Speech Large Language Models
by: Fang, Yuanbo, et al.
Published: (2025)
by: Fang, Yuanbo, et al.
Published: (2025)
WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables
by: Lin, Zhaojiang, et al.
Published: (2025)
by: Lin, Zhaojiang, et al.
Published: (2025)
WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning
by: Mundada, Gagan, et al.
Published: (2025)
by: Mundada, Gagan, et al.
Published: (2025)
InstructSing: High-Fidelity Singing Voice Generation via Instructing Yourself
by: Zeng, Chang, et al.
Published: (2024)
by: Zeng, Chang, et al.
Published: (2024)
ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
by: Wang, He, et al.
Published: (2025)
by: Wang, He, et al.
Published: (2025)
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
by: Shi, Jiatong, et al.
Published: (2023)
by: Shi, Jiatong, et al.
Published: (2023)
Text adaptation for speaker verification with speaker-text factorized embeddings
by: Yang, Yexin, et al.
Published: (2025)
by: Yang, Yexin, et al.
Published: (2025)
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
by: Shi, Jiatong, et al.
Published: (2024)
by: Shi, Jiatong, et al.
Published: (2024)
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
by: Ao, Junyi, et al.
Published: (2024)
by: Ao, Junyi, et al.
Published: (2024)
Hierarchical speaker representation for target speaker extraction
by: He, Shulin, et al.
Published: (2022)
by: He, Shulin, et al.
Published: (2022)
WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
by: Luo, Tianze, et al.
Published: (2025)
by: Luo, Tianze, et al.
Published: (2025)
Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction
by: Ko, Yuka, et al.
Published: (2024)
by: Ko, Yuka, et al.
Published: (2024)
SecureSpeech: Prompt-based Speaker and Content Protection
by: Hui, Belinda Soh Hui, et al.
Published: (2025)
by: Hui, Belinda Soh Hui, et al.
Published: (2025)
CMDAR: A Chinese Multi-scene Dynamic Audio Reasoning Benchmark with Diverse Challenges
by: Li, Hui, et al.
Published: (2025)
by: Li, Hui, et al.
Published: (2025)
AudioBench: A Universal Benchmark for Audio Large Language Models
by: Wang, Bin, et al.
Published: (2024)
by: Wang, Bin, et al.
Published: (2024)
Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition
by: Zhao, Ruoyu, et al.
Published: (2025)
by: Zhao, Ruoyu, et al.
Published: (2025)
ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction
by: Wei, Victor Junqiu, et al.
Published: (2024)
by: Wei, Victor Junqiu, et al.
Published: (2024)
Vedavani: A Benchmark Corpus for ASR on Vedic Sanskrit Poetry
by: Kumar, Sujeet, et al.
Published: (2025)
by: Kumar, Sujeet, et al.
Published: (2025)
Multilingual Source Tracing of Speech Deepfakes: A First Benchmark
by: Xuan, Xi, et al.
Published: (2025)
by: Xuan, Xi, et al.
Published: (2025)
Improving curriculum learning for target speaker extraction with synthetic speakers
by: Liu, Yun, et al.
Published: (2024)
by: Liu, Yun, et al.
Published: (2024)
BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition
by: Tuttösí, Paige, et al.
Published: (2025)
by: Tuttösí, Paige, et al.
Published: (2025)
STAB: Speech Tokenizer Assessment Benchmark
by: Vashishth, Shikhar, et al.
Published: (2024)
by: Vashishth, Shikhar, et al.
Published: (2024)
Similar Items
-
The Third VoicePrivacy Challenge: Preserving Emotional Expressiveness and Linguistic Content in Voice Anonymization
by: Tomashenko, Natalia, et al.
Published: (2026) -
SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
by: Grossman, Raymond, et al.
Published: (2025) -
Mitigating Language Mismatch in SSL-Based Speaker Anonymization
by: Zhang, Zhe, et al.
Published: (2025) -
A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge
by: Wang, Xiaopeng, et al.
Published: (2024) -
An approach to optimize inference of the DIART speaker diarization pipeline
by: Aperdannier, Roman, et al.
Published: (2024)