Saved in:
| Main Authors: | Woszczyk, Dominika, Ribeiro, Manuel Sam, Merritt, Thomas, Korzekwa, Daniel |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.09310 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Factor-Conditioned Speaking-Style Captioning
by: Ando, Atsushi, et al.
Published: (2024)
by: Ando, Atsushi, et al.
Published: (2024)
ClaritySpeech: Dementia Obfuscation in Speech
by: Woszczyk, Dominika, et al.
Published: (2025)
by: Woszczyk, Dominika, et al.
Published: (2025)
Prosody-Driven Privacy-Preserving Dementia Detection
by: Woszczyk, Dominika, et al.
Published: (2024)
by: Woszczyk, Dominika, et al.
Published: (2024)
Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning
by: Ohnaka, Hien, et al.
Published: (2025)
by: Ohnaka, Hien, et al.
Published: (2025)
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
by: Zhang, Yu, et al.
Published: (2023)
by: Zhang, Yu, et al.
Published: (2023)
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
by: Kang, Wonjune, et al.
Published: (2025)
by: Kang, Wonjune, et al.
Published: (2025)
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
by: Zhang, Yu, et al.
Published: (2024)
by: Zhang, Yu, et al.
Published: (2024)
Maestro-EVC: Controllable Emotional Voice Conversion Guided by References and Explicit Prosody
by: Yoon, Jinsung, et al.
Published: (2025)
by: Yoon, Jinsung, et al.
Published: (2025)
Speak Your Mind: The Speech Continuation Task as a Probe of Voice-Based Model Bias
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2025)
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2025)
Revisiting Acoustic Features for Robust ASR
by: Shah, Muhammad A., et al.
Published: (2024)
by: Shah, Muhammad A., et al.
Published: (2024)
AdaptVC: High Quality Voice Conversion with Adaptive Learning
by: Kim, Jaehun, et al.
Published: (2025)
by: Kim, Jaehun, et al.
Published: (2025)
Stepback: Enhanced Disentanglement for Voice Conversion via Multi-Task Learning
by: Yang, Qian, et al.
Published: (2025)
by: Yang, Qian, et al.
Published: (2025)
Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
by: Abdullah, Badr M., et al.
Published: (2025)
by: Abdullah, Badr M., et al.
Published: (2025)
Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion
by: Zhang, Yu, et al.
Published: (2025)
by: Zhang, Yu, et al.
Published: (2025)
Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning
by: He, Haorui, et al.
Published: (2024)
by: He, Haorui, et al.
Published: (2024)
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
by: Li, Junjie, et al.
Published: (2023)
by: Li, Junjie, et al.
Published: (2023)
StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
by: Yao, Jixun, et al.
Published: (2024)
by: Yao, Jixun, et al.
Published: (2024)
Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages
by: Li, Chin-Jou, et al.
Published: (2025)
by: Li, Chin-Jou, et al.
Published: (2025)
Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion
by: Kamble, Anand, et al.
Published: (2023)
by: Kamble, Anand, et al.
Published: (2023)
Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion
by: Tripathi, Kumud, et al.
Published: (2025)
by: Tripathi, Kumud, et al.
Published: (2025)
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
by: Kawamura, Masaya, et al.
Published: (2024)
by: Kawamura, Masaya, et al.
Published: (2024)
VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions
by: Zhan, Jun, et al.
Published: (2025)
by: Zhan, Jun, et al.
Published: (2025)
Building Tailored Speech Recognizers for Japanese Speaking Assessment
by: Kubo, Yotaro, et al.
Published: (2025)
by: Kubo, Yotaro, et al.
Published: (2025)
A Pilot Study of Applying Sequence-to-Sequence Voice Conversion to Evaluate the Intelligibility of L2 Speech Using a Native Speaker's Shadowings
by: Geng, Haopeng, et al.
Published: (2024)
by: Geng, Haopeng, et al.
Published: (2024)
EMALG: An Enhanced Mandarin Lombard Grid Corpus with Meaningful Sentences
by: Li, Baifeng, et al.
Published: (2023)
by: Li, Baifeng, et al.
Published: (2023)
Improving Acoustic Word Embeddings through Correspondence Training of Self-supervised Speech Representations
by: Meghanani, Amit, et al.
Published: (2024)
by: Meghanani, Amit, et al.
Published: (2024)
VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing
by: Zheng, Zhisheng, et al.
Published: (2025)
by: Zheng, Zhisheng, et al.
Published: (2025)
The Third VoicePrivacy Challenge: Preserving Emotional Expressiveness and Linguistic Content in Voice Anonymization
by: Tomashenko, Natalia, et al.
Published: (2026)
by: Tomashenko, Natalia, et al.
Published: (2026)
Voice Adaptation for Swiss German
by: Stucki, Samuel, et al.
Published: (2025)
by: Stucki, Samuel, et al.
Published: (2025)
Marco-Voice Technical Report
by: Tian, Fengping, et al.
Published: (2025)
by: Tian, Fengping, et al.
Published: (2025)
Advancing Automated Speaking Assessment Leveraging Multifaceted Relevance and Grammar Information
by: Lu, Hao-Chien, et al.
Published: (2025)
by: Lu, Hao-Chien, et al.
Published: (2025)
Infusing Acoustic Pause Context into Text-Based Dementia Assessment
by: Braun, Franziska, et al.
Published: (2024)
by: Braun, Franziska, et al.
Published: (2024)
A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions
by: Wang, Chung-Chun, et al.
Published: (2025)
by: Wang, Chung-Chun, et al.
Published: (2025)
StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion
by: Li, Fengjin, et al.
Published: (2025)
by: Li, Fengjin, et al.
Published: (2025)
Towards General-Purpose Text-Instruction-Guided Voice Conversion
by: Kuan, Chun-Yi, et al.
Published: (2023)
by: Kuan, Chun-Yi, et al.
Published: (2023)
Alethia: A Foundational Encoder for Voice Deepfakes
by: Zhu, Yi, et al.
Published: (2026)
by: Zhu, Yi, et al.
Published: (2026)
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
by: Zhang, Qinglin, et al.
Published: (2024)
by: Zhang, Qinglin, et al.
Published: (2024)
Scalable Offline ASR for Command-Style Dictation in Courtrooms
by: Nethil, Kumarmanas, et al.
Published: (2025)
by: Nethil, Kumarmanas, et al.
Published: (2025)
A Theoretical Framework for Acoustic Neighbor Embeddings
by: Jeon, Woojay
Published: (2024)
by: Jeon, Woojay
Published: (2024)
Exploring the Benefits of Tokenization of Discrete Acoustic Units
by: Dekel, Avihu, et al.
Published: (2024)
by: Dekel, Avihu, et al.
Published: (2024)
Similar Items
-
Factor-Conditioned Speaking-Style Captioning
by: Ando, Atsushi, et al.
Published: (2024) -
ClaritySpeech: Dementia Obfuscation in Speech
by: Woszczyk, Dominika, et al.
Published: (2025) -
Prosody-Driven Privacy-Preserving Dementia Detection
by: Woszczyk, Dominika, et al.
Published: (2024) -
Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning
by: Ohnaka, Hien, et al.
Published: (2025) -
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
by: Zhang, Yu, et al.
Published: (2023)