Saved in:
| Main Authors: | Iliescu, Dan Andrei, Mohan, Devang Savita Ram, Teh, Tian Huey, Hodari, Zack |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2303.09446 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in TTS
by: Shin, Seungyoun, et al.
Published: (2025)
by: Shin, Seungyoun, et al.
Published: (2025)
Maestro-EVC: Controllable Emotional Voice Conversion Guided by References and Explicit Prosody
by: Yoon, Jinsung, et al.
Published: (2025)
by: Yoon, Jinsung, et al.
Published: (2025)
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
by: Liu, Jiaxuan, et al.
Published: (2024)
by: Liu, Jiaxuan, et al.
Published: (2024)
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
by: Zhong, Jinzuomu, et al.
Published: (2023)
by: Zhong, Jinzuomu, et al.
Published: (2023)
RepCNN: Micro-sized, Mighty Models for Wakeword Detection
by: Kundu, Arnav, et al.
Published: (2024)
by: Kundu, Arnav, et al.
Published: (2024)
A Human-in-the-Loop Approach to Improving Cross-Text Prosody Transfer
by: Maurya, Himanshu, et al.
Published: (2024)
by: Maurya, Himanshu, et al.
Published: (2024)
ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody
by: Pan, Jianan, et al.
Published: (2026)
by: Pan, Jianan, et al.
Published: (2026)
ProMode: A Speech Prosody Model Conditioned on Acoustic and Textual Inputs
by: Eren, Eray, et al.
Published: (2025)
by: Eren, Eray, et al.
Published: (2025)
Investigating Stochastic Methods for Prosody Modeling in Speech Synthesis
by: Mayer, Paul, et al.
Published: (2025)
by: Mayer, Paul, et al.
Published: (2025)
Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
by: Andrusenko, Andrei, et al.
Published: (2024)
by: Andrusenko, Andrei, et al.
Published: (2024)
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training
by: Oh, Hyung-Seok, et al.
Published: (2023)
by: Oh, Hyung-Seok, et al.
Published: (2023)
NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding
by: Bataev, Vladimir, et al.
Published: (2025)
by: Bataev, Vladimir, et al.
Published: (2025)
FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities
by: Grigoryan, Lilit, et al.
Published: (2025)
by: Grigoryan, Lilit, et al.
Published: (2025)
Pushing the Limits of Beam Search Decoding for Transducer-based ASR models
by: Grigoryan, Lilit, et al.
Published: (2025)
by: Grigoryan, Lilit, et al.
Published: (2025)
OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models
by: Chen, William, et al.
Published: (2025)
by: Chen, William, et al.
Published: (2025)
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models
by: Noroozi, Vahid, et al.
Published: (2024)
by: Noroozi, Vahid, et al.
Published: (2024)
An Automated End-to-End Open-Source Software for High-Quality Text-to-Speech Dataset Generation
by: Gunduz, Ahmet, et al.
Published: (2024)
by: Gunduz, Ahmet, et al.
Published: (2024)
XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark
by: Ciobanu, Ioan-Paul, et al.
Published: (2025)
by: Ciobanu, Ioan-Paul, et al.
Published: (2025)
ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models
by: Qian, Kaizhi, et al.
Published: (2025)
by: Qian, Kaizhi, et al.
Published: (2025)
Usefulness of Emotional Prosody in Neural Machine Translation
by: Brazier, Charles, et al.
Published: (2024)
by: Brazier, Charles, et al.
Published: (2024)
Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models
by: Lee, Kyowoon, et al.
Published: (2025)
by: Lee, Kyowoon, et al.
Published: (2025)
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
by: Han, Wooseok, et al.
Published: (2024)
by: Han, Wooseok, et al.
Published: (2024)
Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning
by: Zhao, Junchuan, et al.
Published: (2025)
by: Zhao, Junchuan, et al.
Published: (2025)
GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
by: Li, Zehua Kcriss, et al.
Published: (2024)
by: Li, Zehua Kcriss, et al.
Published: (2024)
WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling
by: Yang, Guanrou, et al.
Published: (2026)
by: Yang, Guanrou, et al.
Published: (2026)
Do Music Generation Models Encode Music Theory?
by: Wei, Megan, et al.
Published: (2024)
by: Wei, Megan, et al.
Published: (2024)
PRESENT: Zero-Shot Text-to-Prosody Control
by: Lam, Perry, et al.
Published: (2024)
by: Lam, Perry, et al.
Published: (2024)
DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation
by: Jia, Dongya, et al.
Published: (2025)
by: Jia, Dongya, et al.
Published: (2025)
MoonCast: High-Quality Zero-Shot Podcast Generation
by: Ju, Zeqian, et al.
Published: (2025)
by: Ju, Zeqian, et al.
Published: (2025)
Controlling Surprisal in Music Generation via Information Content Curve Matching
by: Bjare, Mathias Rose, et al.
Published: (2024)
by: Bjare, Mathias Rose, et al.
Published: (2024)
A Variational Framework for Improving Naturalness in Generative Spoken Language Models
by: Chen, Li-Wei, et al.
Published: (2025)
by: Chen, Li-Wei, et al.
Published: (2025)
TiCo: Time-Controllable Spoken Dialogue Model
by: Chang, Kai-Wei, et al.
Published: (2026)
by: Chang, Kai-Wei, et al.
Published: (2026)
Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models
by: Yoo, Suho, et al.
Published: (2025)
by: Yoo, Suho, et al.
Published: (2025)
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
Unsupervised Speech Segmentation: A General Approach Using Speech Language Models
by: Elmakies, Avishai, et al.
Published: (2025)
by: Elmakies, Avishai, et al.
Published: (2025)
Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting
by: Yang, Chao-Han Huck, et al.
Published: (2023)
by: Yang, Chao-Han Huck, et al.
Published: (2023)
AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering
by: Kuan, Chun-Yi, et al.
Published: (2026)
by: Kuan, Chun-Yi, et al.
Published: (2026)
PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
by: Mitsui, Kentaro, et al.
Published: (2024)
by: Mitsui, Kentaro, et al.
Published: (2024)
C3LLM: Conditional Multimodal Content Generation Using Large Language Models
by: Wang, Zixuan, et al.
Published: (2024)
by: Wang, Zixuan, et al.
Published: (2024)
Missing Melodies: AI Music Generation and its "Nearly" Complete Omission of the Global South
by: Mehta, Atharva, et al.
Published: (2024)
by: Mehta, Atharva, et al.
Published: (2024)
Similar Items
-
No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in TTS
by: Shin, Seungyoun, et al.
Published: (2025) -
Maestro-EVC: Controllable Emotional Voice Conversion Guided by References and Explicit Prosody
by: Yoon, Jinsung, et al.
Published: (2025) -
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
by: Liu, Jiaxuan, et al.
Published: (2024) -
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
by: Zhong, Jinzuomu, et al.
Published: (2023) -
RepCNN: Micro-sized, Mighty Models for Wakeword Detection
by: Kundu, Arnav, et al.
Published: (2024)