:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Xu, Chen, Wang, Jie, Liu, Xiaoqian, Dong, Qianqian, Zhang, Chunliang, Xiao, Tong, Zhu, Jingbo, Man, Dapeng, Yang, Wu
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computation and Language Audio and Speech Processing
Online-Zugang:	https://arxiv.org/abs/2406.15846
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

A Modular-based Strategy for Mitigating Gradient Conflicts in Simultaneous Speech Translation
von: Liu, Xiaoqian, et al.
Veröffentlicht: (2024)

MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-token Prediction
von: Wang, Jianjin, et al.
Veröffentlicht: (2025)

Recent Advances in End-to-End Simultaneous Speech Translation
von: Liu, Xiaoqian, et al.
Veröffentlicht: (2024)

From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition
von: Huang, Mengcheng, et al.
Veröffentlicht: (2026)

Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
von: Du, Yichao, et al.
Veröffentlicht: (2024)

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
von: Peng, Yifan, et al.
Veröffentlicht: (2024)

Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving
von: Xie, Jingran, et al.
Veröffentlicht: (2025)

Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation
von: Zhang, Yuhao, et al.
Veröffentlicht: (2025)

Scaling Speech-Text Pre-training with Synthetic Interleaved Data
von: Zeng, Aohan, et al.
Veröffentlicht: (2024)

Enhancing Code-switched Text-to-Speech Synthesis Capability in Large Language Models with only Monolingual Corpora
von: Xu, Jing, et al.
Veröffentlicht: (2024)

Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis
von: Do, Cong-Thanh, et al.
Veröffentlicht: (2024)

Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
von: Zhu, Yongxin, et al.
Veröffentlicht: (2024)

Optimizing Speech Multi-View Feature Fusion through Conditional Computation
von: Shan, Weiqiao, et al.
Veröffentlicht: (2025)

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
von: He, Haorui, et al.
Veröffentlicht: (2024)

Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
von: Wu, Haibin, et al.
Veröffentlicht: (2025)

Evaluating Speech-to-Text x LLM x Text-to-Speech Combinations for AI Interview Systems
von: Allbert, Rumi, et al.
Veröffentlicht: (2025)

RAG-Boost: Retrieval-Augmented Generation Enhanced LLM-based Speech Recognition
von: Wang, Pengcheng, et al.
Veröffentlicht: (2025)

Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling
von: Liu, Rui, et al.
Veröffentlicht: (2024)

DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
von: Lu, Ke-Han, et al.
Veröffentlicht: (2024)

Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
von: Chen, Chen, et al.
Veröffentlicht: (2024)

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
von: Cornell, Samuele, et al.
Veröffentlicht: (2024)

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
von: Yang, Yifan, et al.
Veröffentlicht: (2025)

Adaptive Inner Speech-Text Alignment for LLM-based Speech Translation
von: Liu, Henglyu, et al.
Veröffentlicht: (2025)

Continuous Speech Tokenizer in Text To Speech
von: Li, Yixing, et al.
Veröffentlicht: (2024)

SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
von: Fucci, Dennis, et al.
Veröffentlicht: (2024)

The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation
von: Tsai, Yun-Shao, et al.
Veröffentlicht: (2026)

Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
von: Kim, Minsu, et al.
Veröffentlicht: (2023)

Speech Retrieval-Augmented Generation without Automatic Speech Recognition
von: Min, Do June, et al.
Veröffentlicht: (2024)

SEAL: Speech Embedding Alignment Learning for Speech Large Language Model with Retrieval-Augmented Generation
von: Sun, Chunyu, et al.
Veröffentlicht: (2025)

Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
von: Lou, Haowei, et al.
Veröffentlicht: (2025)

StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion
von: Li, Fengjin, et al.
Veröffentlicht: (2025)

Transducer Consistency Regularization for Speech to Text Applications
von: Tseng, Cindy, et al.
Veröffentlicht: (2024)

Long-Form Speech Generation with Spoken Language Models
von: Park, Se Jin, et al.
Veröffentlicht: (2024)

Speech Recognition Model Improves Text-to-Speech Synthesis using Fine-Grained Reward
von: Wang, Guansu, et al.
Veröffentlicht: (2025)

Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios
von: Gállego, Gerard I., et al.
Veröffentlicht: (2025)

SpeechAlign: Aligning Speech Generation to Human Preferences
von: Zhang, Dong, et al.
Veröffentlicht: (2024)

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models
von: Zhu, Han, et al.
Veröffentlicht: (2026)

Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
von: Futami, Hayato, et al.
Veröffentlicht: (2025)

SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models
von: Yang, Dongchao, et al.
Veröffentlicht: (2024)

Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
von: Fang, Yangui, et al.
Veröffentlicht: (2025)