:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Liu, Yunyi, Yang, Shaofan, Li, Kai, Li, Xu
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Sound Audio and Speech Processing
Online-Zugang:	https://arxiv.org/abs/2509.21919
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Simi-SFX: A similarity-based conditioning method for controllable sound effect synthesis
von: Liu, Yunyi, et al.
Veröffentlicht: (2024)

SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
von: Li, Kai, et al.
Veröffentlicht: (2024)

Towards Weakly Supervised Text-to-Audio Grounding
von: Xu, Xuenan, et al.
Veröffentlicht: (2024)

ICGAN: An implicit conditioning method for interpretable feature control of neural audio synthesis
von: Liu, Yunyi, et al.
Veröffentlicht: (2024)

Text adaptation for speaker verification with speaker-text factorized embeddings
von: Yang, Yexin, et al.
Veröffentlicht: (2025)

Comprehend and Talk: Text to Speech Synthesis via Dual Language Modeling
von: Cao, Junjie, et al.
Veröffentlicht: (2025)

PiCoGen2: Piano cover generation with transfer learning approach and weakly aligned data
von: Tan, Chih-Pin, et al.
Veröffentlicht: (2024)

A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds
von: Xu, Xuenan, et al.
Veröffentlicht: (2024)

Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text
von: Xue, Hongfei, et al.
Veröffentlicht: (2024)

BATON: Aligning Text-to-Audio Model with Human Preference Feedback
von: Liao, Huan, et al.
Veröffentlicht: (2024)

Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
von: Zheng, Qixi, et al.
Veröffentlicht: (2025)

Text2FX: Harnessing CLAP Embeddings for Text-Guided Audio Effects
von: Chu, Annie, et al.
Veröffentlicht: (2024)

MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
von: Xue, Heyang, et al.
Veröffentlicht: (2025)

MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
von: Inoue, Sho, et al.
Veröffentlicht: (2024)

EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer
von: Hai, Jiarui, et al.
Veröffentlicht: (2024)

FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
von: Guo, Hao-Han, et al.
Veröffentlicht: (2024)

AudioSpa: Spatializing Sound Events with Text
von: Feng, Linfeng, et al.
Veröffentlicht: (2025)

Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
von: Wang, Hankun, et al.
Veröffentlicht: (2024)

Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis
von: Inoue, Sho, et al.
Veröffentlicht: (2024)

Zero-Shot Text-to-Speech from Continuous Text Streams
von: Dang, Trung, et al.
Veröffentlicht: (2024)

Parallel GPT: Harmonizing the Independence and Interdependence of Acoustic and Semantic Information for Zero-Shot Text-to-Speech
von: Xing, Jingyuan, et al.
Veröffentlicht: (2025)

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
von: Zhang, Xueyao, et al.
Veröffentlicht: (2025)

AudioLCM: Text-to-Audio Generation with Latent Consistency Models
von: Liu, Huadai, et al.
Veröffentlicht: (2024)

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
von: Du, Chenpeng, et al.
Veröffentlicht: (2024)

StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion for Zero-shot Text-to-speech Synthesis
von: Chen, Zhiyong, et al.
Veröffentlicht: (2024)

EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech
von: Qi, Xin, et al.
Veröffentlicht: (2024)

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation
von: Shi, Shuchen, et al.
Veröffentlicht: (2024)

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
von: Du, Chenpeng, et al.
Veröffentlicht: (2022)

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
von: Liu, Huadai, et al.
Veröffentlicht: (2023)

LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
von: Guan, Wenhao, et al.
Veröffentlicht: (2024)

Text-Driven Voice Conversion via Latent State-Space Modeling
von: Li, Wen, et al.
Veröffentlicht: (2025)

CTC-aligned Audio-Text Embedding for Streaming Open-vocabulary Keyword Spotting
von: Jin, Sichen, et al.
Veröffentlicht: (2024)

Fine-tune the pretrained ATST model for sound event detection
von: Shao, Nian, et al.
Veröffentlicht: (2023)

Diffusion based Text-to-Music Generation with Global and Local Text based Conditioning
von: Zhang, Jisi, et al.
Veröffentlicht: (2025)

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
von: Liu, Huadai, et al.
Veröffentlicht: (2024)

TTS-CtrlNet: Time varying emotion aligned text-to-speech generation with ControlNet
von: Jeong, Jaeseok, et al.
Veröffentlicht: (2025)

Investigating Group Relative Policy Optimization for Diffusion Transformer based Text-to-Audio Generation
von: Gu, Yi, et al.
Veröffentlicht: (2026)

T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
von: Wang, Zehan, et al.
Veröffentlicht: (2025)

Fast Algorithm for Moving Sound Source
von: Yang, Dong
Veröffentlicht: (2025)

Evaluating Self-Supervised Speech Models via Text-Based LLMS
von: Maekaku, Takashi, et al.
Veröffentlicht: (2025)