:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chan, Cedric, Kuang, Jianjing
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2511.02104
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Objective Evaluation of Prosody and Intelligibility in Speech Synthesis via Conditional Prediction of Discrete Tokens
by: Ulgen, Ismail Rasim, et al.
Published: (2025)

Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
by: Jiang, Yuepeng, et al.
Published: (2024)

ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models
by: Qian, Kaizhi, et al.
Published: (2025)

Position: Towards Responsible Evaluation for Text-to-Speech
by: Yang, Yifan, et al.
Published: (2025)

Prosody Labeling with Phoneme-BERT and Speech Foundation Models
by: Koriyama, Tomoki
Published: (2025)

Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?
by: Tsiamas, Ioannis, et al.
Published: (2024)

Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition
by: Girish, et al.
Published: (2026)

A Preliminary Analysis of Automatic Word and Syllable Prominence Detection in Non-Native Speech With Text-to-Speech Prosody Embeddings
by: Mondal, Anindita, et al.
Published: (2024)

DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training
by: Oh, Hyung-Seok, et al.
Published: (2023)

SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation
by: Wang, Hui, et al.
Published: (2025)

Investigating Stochastic Methods for Prosody Modeling in Speech Synthesis
by: Mayer, Paul, et al.
Published: (2025)

PRESENT: Zero-Shot Text-to-Prosody Control
by: Lam, Perry, et al.
Published: (2024)

Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
by: Han, Wooseok, et al.
Published: (2024)

Benchmarking Prosody Encoding in Discrete Speech Tokens
by: Onda, Kentaro, et al.
Published: (2025)

FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech
by: Ma, Linhan, et al.
Published: (2025)

Privacy-preserving Prosody Representation Learning
by: Everson, Kevin, et al.
Published: (2026)

Objective and Subjective Evaluation of Diffusion-Based Speech Enhancement for Dysarthric Speech
by: de Groot, Dimme, et al.
Published: (2025)

Phone-Level Prosody Modelling with GMM-Based MDN for Diverse and Controllable Speech Synthesis
by: Du, Chenpeng, et al.
Published: (2021)

FluentEditor2: Text-based Speech Editing by Modeling Multi-Scale Acoustic and Prosody Consistency
by: Liu, Rui, et al.
Published: (2024)

A Human-in-the-Loop Approach to Improving Cross-Text Prosody Transfer
by: Maurya, Himanshu, et al.
Published: (2024)

TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking
by: Zhou, Junzuo, et al.
Published: (2024)

Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-Speech
by: Kim, Youngjae, et al.
Published: (2024)

Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM
by: Cui, Wenqian, et al.
Published: (2026)

Balalaika: Data-Centric, Prosody-Aware Annotation Pipeline for Russian Speech
by: Borodin, Kirill, et al.
Published: (2025)

ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis
by: He, Xiangheng, et al.
Published: (2024)

No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in TTS
by: Shin, Seungyoun, et al.
Published: (2025)

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models
by: Zhu, Han, et al.
Published: (2026)

Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models
by: Bereuter, Paul A., et al.
Published: (2025)

Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration
by: Yang, Yifan, et al.
Published: (2025)

Interleaved Speech-Text Language Models for Simple Streaming Text-to-Speech Synthesis
by: Yang, Yifan, et al.
Published: (2024)

Towards Explainable Spoofed Speech Attribution and Detection:a Probabilistic Approach for Characterizing Speech Synthesizer Components
by: Mishra, Jagabandhu, et al.
Published: (2025)

Prosody Analysis of Audiobooks
by: Pethe, Charuta, et al.
Published: (2023)

TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations
by: Gao, Xiaoxue, et al.
Published: (2024)

AutoProsody: A Prosodic Feature Extraction Tool for Indian Languages
by: Thinakaran, Preethi, et al.
Published: (2025)

Investigating Training Objectives for Generative Speech Enhancement
by: Richter, Julius, et al.
Published: (2024)

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
by: Yang, Dongchao, et al.
Published: (2024)

Classification of Autistic and Non-Autistic Children's Speech: A Cross-Linguistic Study in Finnish, French, and Slovak
by: Kakouros, Sofoklis, et al.
Published: (2026)

Toward Natural Emotional Text-To-Speech System with Fine-Grained Non-Verbal Expression Control
by: Zhou, Wangzixi, et al.
Published: (2026)

DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
by: Liu, Jiaxuan, et al.
Published: (2024)

Speech Quality-Based Localization of Low-Quality Speech and Text-to-Speech Synthesis Artefacts
by: Kuhlmann, Michael, et al.
Published: (2026)