:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Woszczyk, Dominika, Ribeiro, Manuel Sam, Merritt, Thomas, Korzekwa, Daniel
Format:	Preprint
Published:	2025
Subjects:	Sound Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2507.09310
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Factor-Conditioned Speaking-Style Captioning
by: Ando, Atsushi, et al.
Published: (2024)

ClaritySpeech: Dementia Obfuscation in Speech
by: Woszczyk, Dominika, et al.
Published: (2025)

Prosody-Driven Privacy-Preserving Dementia Detection
by: Woszczyk, Dominika, et al.
Published: (2024)

Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning
by: Ohnaka, Hien, et al.
Published: (2025)

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
by: Zhang, Yu, et al.
Published: (2023)

Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
by: Kang, Wonjune, et al.
Published: (2025)

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
by: Zhang, Yu, et al.
Published: (2024)

Maestro-EVC: Controllable Emotional Voice Conversion Guided by References and Explicit Prosody
by: Yoon, Jinsung, et al.
Published: (2025)

Speak Your Mind: The Speech Continuation Task as a Probe of Voice-Based Model Bias
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2025)

Revisiting Acoustic Features for Robust ASR
by: Shah, Muhammad A., et al.
Published: (2024)

AdaptVC: High Quality Voice Conversion with Adaptive Learning
by: Kim, Jaehun, et al.
Published: (2025)

Stepback: Enhanced Disentanglement for Voice Conversion via Multi-Task Learning
by: Yang, Qian, et al.
Published: (2025)

Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
by: Abdullah, Badr M., et al.
Published: (2025)

Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion
by: Zhang, Yu, et al.
Published: (2025)

Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning
by: He, Haorui, et al.
Published: (2024)

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
by: Li, Junjie, et al.
Published: (2023)

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
by: Yao, Jixun, et al.
Published: (2024)

Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages
by: Li, Chin-Jou, et al.
Published: (2025)

Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion
by: Kamble, Anand, et al.
Published: (2023)

Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion
by: Tripathi, Kumud, et al.
Published: (2025)

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
by: Kawamura, Masaya, et al.
Published: (2024)

VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions
by: Zhan, Jun, et al.
Published: (2025)

Building Tailored Speech Recognizers for Japanese Speaking Assessment
by: Kubo, Yotaro, et al.
Published: (2025)

A Pilot Study of Applying Sequence-to-Sequence Voice Conversion to Evaluate the Intelligibility of L2 Speech Using a Native Speaker's Shadowings
by: Geng, Haopeng, et al.
Published: (2024)

EMALG: An Enhanced Mandarin Lombard Grid Corpus with Meaningful Sentences
by: Li, Baifeng, et al.
Published: (2023)

Improving Acoustic Word Embeddings through Correspondence Training of Self-supervised Speech Representations
by: Meghanani, Amit, et al.
Published: (2024)

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing
by: Zheng, Zhisheng, et al.
Published: (2025)

The Third VoicePrivacy Challenge: Preserving Emotional Expressiveness and Linguistic Content in Voice Anonymization
by: Tomashenko, Natalia, et al.
Published: (2026)

Voice Adaptation for Swiss German
by: Stucki, Samuel, et al.
Published: (2025)

Marco-Voice Technical Report
by: Tian, Fengping, et al.
Published: (2025)

Advancing Automated Speaking Assessment Leveraging Multifaceted Relevance and Grammar Information
by: Lu, Hao-Chien, et al.
Published: (2025)

Infusing Acoustic Pause Context into Text-Based Dementia Assessment
by: Braun, Franziska, et al.
Published: (2024)

A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions
by: Wang, Chung-Chun, et al.
Published: (2025)

StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion
by: Li, Fengjin, et al.
Published: (2025)

Towards General-Purpose Text-Instruction-Guided Voice Conversion
by: Kuan, Chun-Yi, et al.
Published: (2023)

Alethia: A Foundational Encoder for Voice Deepfakes
by: Zhu, Yi, et al.
Published: (2026)

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
by: Zhang, Qinglin, et al.
Published: (2024)

Scalable Offline ASR for Command-Style Dictation in Courtrooms
by: Nethil, Kumarmanas, et al.
Published: (2025)

A Theoretical Framework for Acoustic Neighbor Embeddings
by: Jeon, Woojay
Published: (2024)

Exploring the Benefits of Tokenization of Discrete Acoustic Units
by: Dekel, Avihu, et al.
Published: (2024)