Saved in:
| Main Authors: | Webber, Jacob J, Watts, Oliver, Henter, Gustav Eje, Williams, Jennifer, King, Simon |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.14919 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
When Voice Matters: Evidence of Gender Disparity in Positional Bias of SpeechLLMs
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2025)
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2025)
Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2025)
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2025)
Speak Your Mind: The Speech Continuation Task as a Probe of Voice-Based Model Bias
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2025)
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2025)
The Voice Behind the Words: Quantifying Intersectional Bias in SpeechLLMs
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2026)
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2026)
VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency
by: Torgashov, Nikita, et al.
Published: (2025)
by: Torgashov, Nikita, et al.
Published: (2025)
VoXtream2: Full-stream TTS with dynamic speaking rate control
by: Torgashov, Nikita, et al.
Published: (2026)
by: Torgashov, Nikita, et al.
Published: (2026)
HiFi-Glot: High-Fidelity Neural Formant Synthesis with Differentiable Resonant Filters
by: Gu, Yicheng, et al.
Published: (2024)
by: Gu, Yicheng, et al.
Published: (2024)
Comparator Loss: An Ordinal Contrastive Loss to Derive a Severity Score for Speech-based Health Monitoring
by: Webber, Jacob J, et al.
Published: (2025)
by: Webber, Jacob J, et al.
Published: (2025)
Gelina: Unified Speech and Gesture Synthesis via Interleaved Token Prediction
by: Guichoux, Téo, et al.
Published: (2025)
by: Guichoux, Téo, et al.
Published: (2025)
Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals
by: Seki, Kentaro, et al.
Published: (2024)
by: Seki, Kentaro, et al.
Published: (2024)
RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion
by: Chen, Wei, et al.
Published: (2024)
by: Chen, Wei, et al.
Published: (2024)
RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
by: Bargum, Anders R., et al.
Published: (2024)
by: Bargum, Anders R., et al.
Published: (2024)
Voice-ENHANCE: Speech Restoration using a Diffusion-based Voice Conversion Framework
by: Byun, Kyungguen, et al.
Published: (2025)
by: Byun, Kyungguen, et al.
Published: (2025)
ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
by: Zhu, Xinfa, et al.
Published: (2025)
by: Zhu, Xinfa, et al.
Published: (2025)
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
by: Dhar, Sandipan, et al.
Published: (2025)
by: Dhar, Sandipan, et al.
Published: (2025)
Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion
by: Sha, Binzhu, et al.
Published: (2023)
by: Sha, Binzhu, et al.
Published: (2023)
Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
On the Generation and Removal of Speaker Adversarial Perturbation for Voice-Privacy Protection
by: Guo, Chenyang, et al.
Published: (2024)
by: Guo, Chenyang, et al.
Published: (2024)
StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2024)
by: Wang, Zhichao, et al.
Published: (2024)
SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant
by: Hou, Yixuan, et al.
Published: (2025)
by: Hou, Yixuan, et al.
Published: (2025)
Leveraging Diverse Semantic-based Audio Pretrained Models for Singing Voice Conversion
by: Zhang, Xueyao, et al.
Published: (2023)
by: Zhang, Xueyao, et al.
Published: (2023)
Generating Novel and Realistic Speakers for Voice Conversion
by: Chen, Meiying Melissa, et al.
Published: (2025)
by: Chen, Meiying Melissa, et al.
Published: (2025)
LatentVoiceGrad: Nonparallel Voice Conversion with Latent Diffusion/Flow-Matching Models
by: Kameoka, Hirokazu, et al.
Published: (2025)
by: Kameoka, Hirokazu, et al.
Published: (2025)
VoiceGrad: Non-Parallel Any-to-Many Voice Conversion with Annealed Langevin Dynamics
by: Kameoka, Hirokazu, et al.
Published: (2020)
by: Kameoka, Hirokazu, et al.
Published: (2020)
Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations
by: Zhou, Wangjin, et al.
Published: (2024)
by: Zhou, Wangjin, et al.
Published: (2024)
Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model
by: Du, Zongyang, et al.
Published: (2024)
by: Du, Zongyang, et al.
Published: (2024)
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion
by: Ruggiero, Giuseppe, et al.
Published: (2024)
by: Ruggiero, Giuseppe, et al.
Published: (2024)
OneVoice: One Model, Triple Scenarios-Towards Unified Zero-shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2026)
by: Wang, Zhichao, et al.
Published: (2026)
Residual Speaker Representation for One-Shot Voice Conversion
by: Xu, Le, et al.
Published: (2023)
by: Xu, Le, et al.
Published: (2023)
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2024)
by: Wang, Zhichao, et al.
Published: (2024)
Does Your Voice Assistant Remember? Analyzing Conversational Context Recall and Utilization in Voice Interaction Models
by: Kim, Heeseung, et al.
Published: (2025)
by: Kim, Heeseung, et al.
Published: (2025)
Collective Learning Mechanism based Optimal Transport Generative Adversarial Network for Non-parallel Voice Conversion
by: Dhar, Sandipan, et al.
Published: (2025)
by: Dhar, Sandipan, et al.
Published: (2025)
REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion
by: Biyani, Ishan D., et al.
Published: (2025)
by: Biyani, Ishan D., et al.
Published: (2025)
VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion
by: Byun, Kyungguen, et al.
Published: (2024)
by: Byun, Kyungguen, et al.
Published: (2024)
SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark
by: Saito, Yuki, et al.
Published: (2024)
by: Saito, Yuki, et al.
Published: (2024)
An Extensive Analysis of the Singing Voice Conversion Challenge 2025 Evaluation Results
by: Violeta, Lester Phillip, et al.
Published: (2025)
by: Violeta, Lester Phillip, et al.
Published: (2025)
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions
by: Kang, Wonjune, et al.
Published: (2022)
by: Kang, Wonjune, et al.
Published: (2022)
FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion
by: Ferreira, Alef Iury Siqueira, et al.
Published: (2025)
by: Ferreira, Alef Iury Siqueira, et al.
Published: (2025)
Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis
by: Kim, Tae-Woo, et al.
Published: (2022)
by: Kim, Tae-Woo, et al.
Published: (2022)
Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion
by: Li, Ruiqi, et al.
Published: (2024)
by: Li, Ruiqi, et al.
Published: (2024)
Similar Items
-
When Voice Matters: Evidence of Gender Disparity in Positional Bias of SpeechLLMs
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2025) -
Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2025) -
Speak Your Mind: The Speech Continuation Task as a Probe of Voice-Based Model Bias
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2025) -
The Voice Behind the Words: Quantifying Intersectional Bias in SpeechLLMs
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2026) -
VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency
by: Torgashov, Nikita, et al.
Published: (2025)