:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Li, Zhipeng, Xing, Xiaofen, Wang, Jun, Chen, Shuaiqi, Yu, Guoqiao, Wan, Guanglu, Xu, Xiangmin
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Audio and Speech Processing
Accesso online:	https://arxiv.org/abs/2409.05730
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Long-Context Speech Synthesis with Context-Aware Memory
di: Li, Zhipeng, et al.
Pubblicazione: (2025)

Parallel GPT: Harmonizing the Independence and Interdependence of Acoustic and Semantic Information for Zero-Shot Text-to-Speech
di: Xing, Jingyuan, et al.
Pubblicazione: (2025)

Multi-Scale Temporal Transformer For Speech Emotion Recognition
di: Li, Zhipeng, et al.
Pubblicazione: (2024)

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
di: Chen, Weidong, et al.
Pubblicazione: (2023)

LongCat-AudioDiT: High-Fidelity Diffusion Text-to-Speech in the Waveform Latent Space
di: Xin, Detai, et al.
Pubblicazione: (2026)

S2SBench: A Benchmark for Quantifying Intelligence Degradation in Speech-to-Speech Large Language Models
di: Fang, Yuanbo, et al.
Pubblicazione: (2025)

MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech
di: Mai, Jialong, et al.
Pubblicazione: (2025)

HD-PPT: Hierarchical Decoding of Content- and Prompt-Preference Tokens for Instruction-based TTS
di: Nie, Sihang, et al.
Pubblicazione: (2025)

LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models
di: Zhao, Xiaohan, et al.
Pubblicazione: (2025)

Interleaved Speech-Text Language Models for Simple Streaming Text-to-Speech Synthesis
di: Yang, Yifan, et al.
Pubblicazione: (2024)

Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
di: Liao, Shijia, et al.
Pubblicazione: (2024)

ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis
di: Li, Haitao, et al.
Pubblicazione: (2026)

MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
di: Yang, Qian, et al.
Pubblicazione: (2024)

Textless and Non-Parallel Speech-to-Speech Emotion Style Transfer
di: Dutta, Soumya, et al.
Pubblicazione: (2025)

ISSE: An Instruction-Guided Speech Style Editing Dataset And Benchmark
di: Chen, Yun, et al.
Pubblicazione: (2025)

MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
di: Li, Song, et al.
Pubblicazione: (2024)

MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
di: Guan, Wenhao, et al.
Pubblicazione: (2023)

FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
di: Zhang, Tian-Hao, et al.
Pubblicazione: (2025)

Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
di: Deng, Yimin, et al.
Pubblicazione: (2024)

SemaVoice: Semantic-Aware Continuous Autoregressive Speech Synthesis
di: Wang, Huimeng, et al.
Pubblicazione: (2026)

Emotion-Coherent Speech Data Augmentation and Self-Supervised Contrastive Style Training for Enhancing Kids's Story Speech Synthesis
di: Chung, Raymond
Pubblicazione: (2026)

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
di: Wang, Helin, et al.
Pubblicazione: (2024)

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
di: Wang, Yongqi, et al.
Pubblicazione: (2023)

Speech Quality-Based Localization of Low-Quality Speech and Text-to-Speech Synthesis Artefacts
di: Kuhlmann, Michael, et al.
Pubblicazione: (2026)

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
di: Li, Yinghao Aaron, et al.
Pubblicazione: (2024)

ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
di: Tao, Dehua, et al.
Pubblicazione: (2024)

DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions
di: Chen, Weidong, et al.
Pubblicazione: (2025)

Speech-Omni-Lite: Portable Speech Interfaces for Vision-Language Models
di: Tao, Dehua, et al.
Pubblicazione: (2026)

RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis
di: Shi, Haoxiang, et al.
Pubblicazione: (2024)

Adapting Speech Foundation Models for Unified Multimodal Speech Recognition with Large Language Models
di: Zhang, Jing-Xuan, et al.
Pubblicazione: (2025)

Debatts: Zero-Shot Debating Text-to-Speech Synthesis
di: Huang, Yiqiao, et al.
Pubblicazione: (2024)

UniTalker: Conversational Speech-Visual Synthesis
di: Hu, Yifan, et al.
Pubblicazione: (2025)

FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech
di: Ma, Linhan, et al.
Pubblicazione: (2025)

Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms
di: Zhang, Chu Yuan, et al.
Pubblicazione: (2023)

Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation
di: Liu, Wenrui, et al.
Pubblicazione: (2025)

Hierarchical Control of Emotion Rendering in Speech Synthesis
di: Inoue, Sho, et al.
Pubblicazione: (2024)

A Survey on Speech Large Language Models for Understanding
di: Peng, Jing, et al.
Pubblicazione: (2024)

Adaptive Speaker Embedding Self-Augmentation for Personal Voice Activity Detection with Short Enrollment Speech
di: Feng, Fuyuan, et al.
Pubblicazione: (2026)

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
di: Ji, Shengpeng, et al.
Pubblicazione: (2023)

EmoSSLSphere: Multilingual Emotional Speech Synthesis with Spherical Vectors and Discrete Speech Tokens
di: Park, Joonyong, et al.
Pubblicazione: (2025)