Saved in:
| Main Authors: | Chou, Hsing-Hang, Lin, Yun-Shao, Sung, Ching-Chin, Tsao, Yu, Lee, Chi-Chun |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.03636 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Zero-Shot Duet Singing Voices Separation with Diffusion Models
by: Yu, Chin-Yun, et al.
Published: (2023)
by: Yu, Chin-Yun, et al.
Published: (2023)
Lessons Learnt: Revisit Key Training Strategies for Effective Speech Emotion Recognition in the Wild
by: Tzeng, Jing-Tong, et al.
Published: (2025)
by: Tzeng, Jing-Tong, et al.
Published: (2025)
Revisiting Modeling and Evaluation Approaches in Speech Emotion Recognition: Considering Subjectivity of Annotators and Ambiguity of Emotions
by: Chou, Huang-Cheng, et al.
Published: (2025)
by: Chou, Huang-Cheng, et al.
Published: (2025)
Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion
by: Chen, Yun, et al.
Published: (2023)
by: Chen, Yun, et al.
Published: (2023)
Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
CO-VADA: A Confidence-Oriented Voice Augmentation Debiasing Approach for Fair Speech Emotion Recognition
by: Tsai, Yun-Shao, et al.
Published: (2025)
by: Tsai, Yun-Shao, et al.
Published: (2025)
Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion
by: Akti, Seymanur, et al.
Published: (2025)
by: Akti, Seymanur, et al.
Published: (2025)
REF-VC: Robust, Expressive and Fast Zero-Shot Voice Conversion with Diffusion Transformers
by: Jiang, Yuepeng, et al.
Published: (2025)
by: Jiang, Yuepeng, et al.
Published: (2025)
Discl-VC: Disentangled Discrete Tokens and In-Context Learning for Controllable Zero-Shot Voice Conversion
by: Wang, Kaidi, et al.
Published: (2025)
by: Wang, Kaidi, et al.
Published: (2025)
QR-VC: Leveraging Quantization Residuals for Linear Disentanglement in Zero-Shot Voice Conversion
by: Sim, Youngjun, et al.
Published: (2024)
by: Sim, Youngjun, et al.
Published: (2024)
ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
by: Zhu, Xinfa, et al.
Published: (2025)
by: Zhu, Xinfa, et al.
Published: (2025)
EMO-Codec: An In-Depth Look at Emotion Preservation capacity of Legacy and Neural Codec Models With Subjective and Objective Evaluations
by: Ren, Wenze, et al.
Published: (2024)
by: Ren, Wenze, et al.
Published: (2024)
Emo-bias: A Large Scale Evaluation of Social Bias on Speech Emotion Recognition
by: Lin, Yi-Cheng, et al.
Published: (2024)
by: Lin, Yi-Cheng, et al.
Published: (2024)
Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations
by: Zhou, Wangjin, et al.
Published: (2024)
by: Zhou, Wangjin, et al.
Published: (2024)
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions
by: Kang, Wonjune, et al.
Published: (2022)
by: Kang, Wonjune, et al.
Published: (2022)
Discrete Unit based Masking for Improving Disentanglement in Voice Conversion
by: Lee, Philip H., et al.
Published: (2024)
by: Lee, Philip H., et al.
Published: (2024)
CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion
by: Li, Yuke, et al.
Published: (2024)
by: Li, Yuke, et al.
Published: (2024)
GenVC: Self-Supervised Zero-Shot Voice Conversion
by: Cai, Zexin, et al.
Published: (2025)
by: Cai, Zexin, et al.
Published: (2025)
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2024)
by: Wang, Zhichao, et al.
Published: (2024)
Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement
by: Zhang, Xueyao, et al.
Published: (2025)
by: Zhang, Xueyao, et al.
Published: (2025)
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
by: Dutta, Soumya, et al.
Published: (2024)
by: Dutta, Soumya, et al.
Published: (2024)
MaskVCT: Masked Voice Codec Transformer for Zero-Shot Voice Conversion With Increased Controllability via Multiple Guidances
by: Lee, Junhyeok, et al.
Published: (2025)
by: Lee, Junhyeok, et al.
Published: (2025)
LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance
by: Chen, Shihao, et al.
Published: (2024)
by: Chen, Shihao, et al.
Published: (2024)
Zero-shot Voice Conversion with Diffusion Transformers
by: Liu, Songting
Published: (2024)
by: Liu, Songting
Published: (2024)
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
by: Zuo, Jialong, et al.
Published: (2025)
by: Zuo, Jialong, et al.
Published: (2025)
StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
by: Yao, Jixun, et al.
Published: (2024)
by: Yao, Jixun, et al.
Published: (2024)
ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed
by: Chen, Meiying, et al.
Published: (2022)
by: Chen, Meiying, et al.
Published: (2022)
MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
by: Ma, Guobin, et al.
Published: (2025)
by: Ma, Guobin, et al.
Published: (2025)
Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference
by: Dai, Shuqi, et al.
Published: (2025)
by: Dai, Shuqi, et al.
Published: (2025)
SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models
by: Yin, Chun, et al.
Published: (2024)
by: Yin, Chun, et al.
Published: (2024)
Emotion-Aware Prefix: Towards Explicit Emotion Control in Voice Conversion Models
by: Yang, Haoyuan, et al.
Published: (2026)
by: Yang, Haoyuan, et al.
Published: (2026)
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech
by: Avdeeva, Anastasia, et al.
Published: (2024)
by: Avdeeva, Anastasia, et al.
Published: (2024)
AdaLTM: Adaptive Layer-wise Task Vector Merging for Categorical Speech Emotion Recognition with ASR Knowledge Integration
by: Lee, Chia-Yu, et al.
Published: (2026)
by: Lee, Chia-Yu, et al.
Published: (2026)
Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion
by: Rong, Yan, et al.
Published: (2024)
by: Rong, Yan, et al.
Published: (2024)
OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models
by: Zhu, Han, et al.
Published: (2026)
by: Zhu, Han, et al.
Published: (2026)
RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding
by: Liu, Yisi, et al.
Published: (2025)
by: Liu, Yisi, et al.
Published: (2025)
ASR for Affective Speech: Investigating Impact of Emotion and Speech Generative Strategy
by: Wu, Ya-Tse, et al.
Published: (2026)
by: Wu, Ya-Tse, et al.
Published: (2026)
Disentangled Dual-Branch Graph Learning for Conversational Emotion Recognition
by: Guo, Chengling, et al.
Published: (2026)
by: Guo, Chengling, et al.
Published: (2026)
Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion
by: Zhang, Yu, et al.
Published: (2025)
by: Zhang, Yu, et al.
Published: (2025)
Voice-ENHANCE: Speech Restoration using a Diffusion-based Voice Conversion Framework
by: Byun, Kyungguen, et al.
Published: (2025)
by: Byun, Kyungguen, et al.
Published: (2025)
Similar Items
-
Zero-Shot Duet Singing Voices Separation with Diffusion Models
by: Yu, Chin-Yun, et al.
Published: (2023) -
Lessons Learnt: Revisit Key Training Strategies for Effective Speech Emotion Recognition in the Wild
by: Tzeng, Jing-Tong, et al.
Published: (2025) -
Revisiting Modeling and Evaluation Approaches in Speech Emotion Recognition: Considering Subjectivity of Annotators and Ambiguity of Emotions
by: Chou, Huang-Cheng, et al.
Published: (2025) -
Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion
by: Chen, Yun, et al.
Published: (2023) -
Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion
by: Chen, Zhengyang, et al.
Published: (2024)