Saved in:
| Main Authors: | Cooper, Erica, Maguer, Sébastien Le, Klabbers, Esther, Yamagishi, Junichi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.03250 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards An Integrated Approach for Expressive Piano Performance Synthesis from Music Scores
by: Tang, Jingjing, et al.
Published: (2025)
by: Tang, Jingjing, et al.
Published: (2025)
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches
by: Zeng, Chang, et al.
Published: (2024)
by: Zeng, Chang, et al.
Published: (2024)
AfriHuBERT: A self-supervised speech representation model for African languages
by: Alabi, Jesujoba O., et al.
Published: (2024)
by: Alabi, Jesujoba O., et al.
Published: (2024)
Towards Data Drift Monitoring for Speech Deepfake Detection in the context of MLOps
by: Wang, Xin, et al.
Published: (2025)
by: Wang, Xin, et al.
Published: (2025)
FakeMark: Deepfake Speech Attribution With Watermarked Artifacts
by: Ge, Wanying, et al.
Published: (2025)
by: Ge, Wanying, et al.
Published: (2025)
Does Fine-tuning by Reinforcement Learning Improve Generalization in Binary Speech Deepfake Detection?
by: Wang, Xin, et al.
Published: (2026)
by: Wang, Xin, et al.
Published: (2026)
VoxEffects: A Speech-Oriented Audio Effects Dataset and Benchmark
by: Zhang, Zhe, et al.
Published: (2026)
by: Zhang, Zhe, et al.
Published: (2026)
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
by: Gong, Cheng, et al.
Published: (2023)
by: Gong, Cheng, et al.
Published: (2023)
Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio
by: Zhang, Lin, et al.
Published: (2024)
by: Zhang, Lin, et al.
Published: (2024)
Improving curriculum learning for target speaker extraction with synthetic speakers
by: Liu, Yun, et al.
Published: (2024)
by: Liu, Yun, et al.
Published: (2024)
Post-training for Deepfake Speech Detection
by: Ge, Wanying, et al.
Published: (2025)
by: Ge, Wanying, et al.
Published: (2025)
The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
by: Huang, Wen-Chin, et al.
Published: (2024)
by: Huang, Wen-Chin, et al.
Published: (2024)
Target Speaker Extraction with Curriculum Learning
by: Liu, Yun, et al.
Published: (2024)
by: Liu, Yun, et al.
Published: (2024)
Libri2Vox Dataset: Target Speaker Extraction with Diverse Speaker Conditions and Synthetic Data
by: Liu, Yun, et al.
Published: (2024)
by: Liu, Yun, et al.
Published: (2024)
Explaining Speaker and Spoof Embeddings via Probing
by: Liu, Xuechen, et al.
Published: (2024)
by: Liu, Xuechen, et al.
Published: (2024)
Quantifying Source Speaker Leakage in One-to-One Voice Conversion
by: Wellington, Scott, et al.
Published: (2025)
by: Wellington, Scott, et al.
Published: (2025)
A Preliminary Case Study on Long-Form In-the-Wild Audio Spoofing Detection
by: Liu, Xuechen, et al.
Published: (2024)
by: Liu, Xuechen, et al.
Published: (2024)
Human perception of audio deepfakes: the role of language and speaking style
by: Segundo, Eugenia San, et al.
Published: (2025)
by: Segundo, Eugenia San, et al.
Published: (2025)
From Sharpness to Better Generalization for Speech Deepfake Detection
by: Huang, Wen, et al.
Published: (2025)
by: Huang, Wen, et al.
Published: (2025)
Mitigating Language Mismatch in SSL-Based Speaker Anonymization
by: Zhang, Zhe, et al.
Published: (2025)
by: Zhang, Zhe, et al.
Published: (2025)
Assessing speech quality metrics for evaluation of neural audio codecs under clean speech conditions
by: Mack, Wolfgang, et al.
Published: (2025)
by: Mack, Wolfgang, et al.
Published: (2025)
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
by: Gong, Cheng, et al.
Published: (2024)
by: Gong, Cheng, et al.
Published: (2024)
LENS-DF: Deepfake Detection and Temporal Localization for Long-Form Noisy Speech
by: Liu, Xuechen, et al.
Published: (2025)
by: Liu, Xuechen, et al.
Published: (2025)
Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis
by: Wang, Xin, et al.
Published: (2024)
by: Wang, Xin, et al.
Published: (2024)
Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
Deepfake Word Detection by Next-token Prediction using Fine-tuned Whisper
by: Tran, Hoan My, et al.
Published: (2026)
by: Tran, Hoan My, et al.
Published: (2026)
The First VoicePrivacy Attacker Challenge
by: Tomashenko, Natalia, et al.
Published: (2025)
by: Tomashenko, Natalia, et al.
Published: (2025)
The First VoicePrivacy Attacker Challenge Evaluation Plan
by: Tomashenko, Natalia, et al.
Published: (2024)
by: Tomashenko, Natalia, et al.
Published: (2024)
Spoofing attack augmentation: can differently-trained attack models improve generalisation?
by: Ge, Wanying, et al.
Published: (2023)
by: Ge, Wanying, et al.
Published: (2023)
ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech
by: Wang, Xin, et al.
Published: (2025)
by: Wang, Xin, et al.
Published: (2025)
The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation
by: Panariello, Michele, et al.
Published: (2024)
by: Panariello, Michele, et al.
Published: (2024)
Ensemble of classifiers for speech evaluation
by: Belokrylov, G., et al.
Published: (2024)
by: Belokrylov, G., et al.
Published: (2024)
SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit
by: Huang, Wen-Chin, et al.
Published: (2025)
by: Huang, Wen-Chin, et al.
Published: (2025)
MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech Quality Assessment Models
by: Huang, Wen-Chin, et al.
Published: (2024)
by: Huang, Wen-Chin, et al.
Published: (2024)
CodecMOS-Accent: A MOS Benchmark of Resynthesized and TTS Speech from Neural Codecs Across English Accents
by: Huang, Wen-Chin, et al.
Published: (2026)
by: Huang, Wen-Chin, et al.
Published: (2026)
MUSHRA-1S: A scalable and sensitive test approach for evaluating top-tier speech processing systems
by: Lechler, Laura, et al.
Published: (2025)
by: Lechler, Laura, et al.
Published: (2025)
Target speaker anonymization in multi-speaker recordings
by: Tomashenko, Natalia, et al.
Published: (2025)
by: Tomashenko, Natalia, et al.
Published: (2025)
MIDI-VALLE: Improving Expressive Piano Performance Synthesis Through Neural Codec Language Modelling
by: Tang, Jingjing, et al.
Published: (2025)
by: Tang, Jingjing, et al.
Published: (2025)
Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios
by: Huang, Ziling, et al.
Published: (2025)
by: Huang, Ziling, et al.
Published: (2025)
Similar Items
-
Towards An Integrated Approach for Expressive Piano Performance Synthesis from Music Scores
by: Tang, Jingjing, et al.
Published: (2025) -
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
by: Chen, Zhengyang, et al.
Published: (2024) -
Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches
by: Zeng, Chang, et al.
Published: (2024) -
AfriHuBERT: A self-supervised speech representation model for African languages
by: Alabi, Jesujoba O., et al.
Published: (2024) -
Towards Data Drift Monitoring for Speech Deepfake Detection in the context of MLOps
by: Wang, Xin, et al.
Published: (2025)