Saved in:
| Main Authors: | Chan, Cedric, Kuang, Jianjing |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.02104 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Objective Evaluation of Prosody and Intelligibility in Speech Synthesis via Conditional Prediction of Discrete Tokens
by: Ulgen, Ismail Rasim, et al.
Published: (2025)
by: Ulgen, Ismail Rasim, et al.
Published: (2025)
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
by: Jiang, Yuepeng, et al.
Published: (2024)
by: Jiang, Yuepeng, et al.
Published: (2024)
ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models
by: Qian, Kaizhi, et al.
Published: (2025)
by: Qian, Kaizhi, et al.
Published: (2025)
Position: Towards Responsible Evaluation for Text-to-Speech
by: Yang, Yifan, et al.
Published: (2025)
by: Yang, Yifan, et al.
Published: (2025)
Prosody Labeling with Phoneme-BERT and Speech Foundation Models
by: Koriyama, Tomoki
Published: (2025)
by: Koriyama, Tomoki
Published: (2025)
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?
by: Tsiamas, Ioannis, et al.
Published: (2024)
by: Tsiamas, Ioannis, et al.
Published: (2024)
Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition
by: Girish, et al.
Published: (2026)
by: Girish, et al.
Published: (2026)
A Preliminary Analysis of Automatic Word and Syllable Prominence Detection in Non-Native Speech With Text-to-Speech Prosody Embeddings
by: Mondal, Anindita, et al.
Published: (2024)
by: Mondal, Anindita, et al.
Published: (2024)
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training
by: Oh, Hyung-Seok, et al.
Published: (2023)
by: Oh, Hyung-Seok, et al.
Published: (2023)
SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
Investigating Stochastic Methods for Prosody Modeling in Speech Synthesis
by: Mayer, Paul, et al.
Published: (2025)
by: Mayer, Paul, et al.
Published: (2025)
PRESENT: Zero-Shot Text-to-Prosody Control
by: Lam, Perry, et al.
Published: (2024)
by: Lam, Perry, et al.
Published: (2024)
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
by: Han, Wooseok, et al.
Published: (2024)
by: Han, Wooseok, et al.
Published: (2024)
Benchmarking Prosody Encoding in Discrete Speech Tokens
by: Onda, Kentaro, et al.
Published: (2025)
by: Onda, Kentaro, et al.
Published: (2025)
FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech
by: Ma, Linhan, et al.
Published: (2025)
by: Ma, Linhan, et al.
Published: (2025)
Privacy-preserving Prosody Representation Learning
by: Everson, Kevin, et al.
Published: (2026)
by: Everson, Kevin, et al.
Published: (2026)
Objective and Subjective Evaluation of Diffusion-Based Speech Enhancement for Dysarthric Speech
by: de Groot, Dimme, et al.
Published: (2025)
by: de Groot, Dimme, et al.
Published: (2025)
Phone-Level Prosody Modelling with GMM-Based MDN for Diverse and Controllable Speech Synthesis
by: Du, Chenpeng, et al.
Published: (2021)
by: Du, Chenpeng, et al.
Published: (2021)
FluentEditor2: Text-based Speech Editing by Modeling Multi-Scale Acoustic and Prosody Consistency
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
A Human-in-the-Loop Approach to Improving Cross-Text Prosody Transfer
by: Maurya, Himanshu, et al.
Published: (2024)
by: Maurya, Himanshu, et al.
Published: (2024)
TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking
by: Zhou, Junzuo, et al.
Published: (2024)
by: Zhou, Junzuo, et al.
Published: (2024)
Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-Speech
by: Kim, Youngjae, et al.
Published: (2024)
by: Kim, Youngjae, et al.
Published: (2024)
Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM
by: Cui, Wenqian, et al.
Published: (2026)
by: Cui, Wenqian, et al.
Published: (2026)
Balalaika: Data-Centric, Prosody-Aware Annotation Pipeline for Russian Speech
by: Borodin, Kirill, et al.
Published: (2025)
by: Borodin, Kirill, et al.
Published: (2025)
ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis
by: He, Xiangheng, et al.
Published: (2024)
by: He, Xiangheng, et al.
Published: (2024)
No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in TTS
by: Shin, Seungyoun, et al.
Published: (2025)
by: Shin, Seungyoun, et al.
Published: (2025)
OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models
by: Zhu, Han, et al.
Published: (2026)
by: Zhu, Han, et al.
Published: (2026)
Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models
by: Bereuter, Paul A., et al.
Published: (2025)
by: Bereuter, Paul A., et al.
Published: (2025)
Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration
by: Yang, Yifan, et al.
Published: (2025)
by: Yang, Yifan, et al.
Published: (2025)
Interleaved Speech-Text Language Models for Simple Streaming Text-to-Speech Synthesis
by: Yang, Yifan, et al.
Published: (2024)
by: Yang, Yifan, et al.
Published: (2024)
Towards Explainable Spoofed Speech Attribution and Detection:a Probabilistic Approach for Characterizing Speech Synthesizer Components
by: Mishra, Jagabandhu, et al.
Published: (2025)
by: Mishra, Jagabandhu, et al.
Published: (2025)
Prosody Analysis of Audiobooks
by: Pethe, Charuta, et al.
Published: (2023)
by: Pethe, Charuta, et al.
Published: (2023)
TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations
by: Gao, Xiaoxue, et al.
Published: (2024)
by: Gao, Xiaoxue, et al.
Published: (2024)
AutoProsody: A Prosodic Feature Extraction Tool for Indian Languages
by: Thinakaran, Preethi, et al.
Published: (2025)
by: Thinakaran, Preethi, et al.
Published: (2025)
Investigating Training Objectives for Generative Speech Enhancement
by: Richter, Julius, et al.
Published: (2024)
by: Richter, Julius, et al.
Published: (2024)
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
by: Yang, Dongchao, et al.
Published: (2024)
by: Yang, Dongchao, et al.
Published: (2024)
Classification of Autistic and Non-Autistic Children's Speech: A Cross-Linguistic Study in Finnish, French, and Slovak
by: Kakouros, Sofoklis, et al.
Published: (2026)
by: Kakouros, Sofoklis, et al.
Published: (2026)
Toward Natural Emotional Text-To-Speech System with Fine-Grained Non-Verbal Expression Control
by: Zhou, Wangzixi, et al.
Published: (2026)
by: Zhou, Wangzixi, et al.
Published: (2026)
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
by: Liu, Jiaxuan, et al.
Published: (2024)
by: Liu, Jiaxuan, et al.
Published: (2024)
Speech Quality-Based Localization of Low-Quality Speech and Text-to-Speech Synthesis Artefacts
by: Kuhlmann, Michael, et al.
Published: (2026)
by: Kuhlmann, Michael, et al.
Published: (2026)
Similar Items
-
Objective Evaluation of Prosody and Intelligibility in Speech Synthesis via Conditional Prediction of Discrete Tokens
by: Ulgen, Ismail Rasim, et al.
Published: (2025) -
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
by: Jiang, Yuepeng, et al.
Published: (2024) -
ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models
by: Qian, Kaizhi, et al.
Published: (2025) -
Position: Towards Responsible Evaluation for Text-to-Speech
by: Yang, Yifan, et al.
Published: (2025) -
Prosody Labeling with Phoneme-BERT and Speech Foundation Models
by: Koriyama, Tomoki
Published: (2025)