:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhu, Yongxin, Su, Dan, He, Liqiang, Xu, Linli, Yu, Dong
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computation and Language Sound Audio and Speech Processing
Online-Zugang:	https://arxiv.org/abs/2406.00976
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
von: Du, Yichao, et al.
Veröffentlicht: (2024)

Scaling Speech-Text Pre-training with Synthetic Interleaved Data
von: Zeng, Aohan, et al.
Veröffentlicht: (2024)

Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training
von: Dong, Lukuan, et al.
Veröffentlicht: (2024)

UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
von: Liu, Alexander H., et al.
Veröffentlicht: (2025)

BLSP-KD: Bootstrapping Language-Speech Pre-training via Knowledge Distillation
von: Wang, Chen, et al.
Veröffentlicht: (2024)

BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
von: Wang, Chen, et al.
Veröffentlicht: (2023)

GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model
von: Gao, Yingying, et al.
Veröffentlicht: (2024)

Generative Pre-training for Speech with Flow Matching
von: Liu, Alexander H., et al.
Veröffentlicht: (2023)

Attentive Merging of Hidden Embeddings from Pre-trained Speech Model for Anti-spoofing Detection
von: Pan, Zihan, et al.
Veröffentlicht: (2024)

Chunk Based Speech Pre-training with High Resolution Finite Scalar Quantization
von: Tang, Yun, et al.
Veröffentlicht: (2025)

Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs
von: Yuhang, Yang, et al.
Veröffentlicht: (2024)

SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
von: Zhang, Xin, et al.
Veröffentlicht: (2023)

SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
von: Hu, Ke, et al.
Veröffentlicht: (2025)

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
von: Yan, Haoqiu, et al.
Veröffentlicht: (2024)

MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition
von: Sun, Haiyang, et al.
Veröffentlicht: (2023)

Fine-grained Speech Sentiment Analysis in Chinese Psychological Support Hotlines Based on Large-scale Pre-trained Model
von: Chen, Zhonglong, et al.
Veröffentlicht: (2024)

Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model
von: Lehečka, Jan, et al.
Veröffentlicht: (2024)

Re-Parameterization of Lightweight Transformer for On-Device Speech Emotion Recognition
von: Zhang, Zixing, et al.
Veröffentlicht: (2024)

Are Transformers in Pre-trained LM A Good ASR Encoder? An Empirical Study
von: An, Keyu, et al.
Veröffentlicht: (2024)

Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving
von: Xie, Jingran, et al.
Veröffentlicht: (2025)

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
von: Cornell, Samuele, et al.
Veröffentlicht: (2024)

HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech
von: Dong, Zhongren, et al.
Veröffentlicht: (2024)

Long-Form Speech Generation with Spoken Language Models
von: Park, Se Jin, et al.
Veröffentlicht: (2024)

SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models
von: Yang, Dongchao, et al.
Veröffentlicht: (2024)

SSR: Alignment-Aware Modality Connector for Speech Language Models
von: Tan, Weiting, et al.
Veröffentlicht: (2024)

SEAL: Speech Embedding Alignment Learning for Speech Large Language Model with Retrieval-Augmented Generation
von: Sun, Chunyu, et al.
Veröffentlicht: (2025)

SpeechAlign: Aligning Speech Generation to Human Preferences
von: Zhang, Dong, et al.
Veröffentlicht: (2024)

Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space
von: Ma, Zhengrui, et al.
Veröffentlicht: (2025)

Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis
von: Lemerle, Théodor, et al.
Veröffentlicht: (2024)

Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models
von: Tang, Zhiyuan, et al.
Veröffentlicht: (2024)

SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation
von: Zhang, Dong, et al.
Veröffentlicht: (2024)

AzeroS: Extending LLM to Speech with Self-Generated Instruction-Free Tuning
von: Shao, Yiwen, et al.
Veröffentlicht: (2025)

From Statistical Methods to Pre-Trained Models; A Survey on Automatic Speech Recognition for Resource Scarce Urdu Language
von: Sharif, Muhammad, et al.
Veröffentlicht: (2024)

Identifying Primary Stress Across Related Languages and Dialects with Transformer-based Speech Encoder Models
von: Ljubešić, Nikola, et al.
Veröffentlicht: (2025)

Leveraging Large Language Models for Sarcastic Speech Annotation in Sarcasm Detection
von: Li, Zhu, et al.
Veröffentlicht: (2025)

InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
von: Wang, Dingdong, et al.
Veröffentlicht: (2025)

S2SBench: A Benchmark for Quantifying Intelligence Degradation in Speech-to-Speech Large Language Models
von: Fang, Yuanbo, et al.
Veröffentlicht: (2025)

Recent Advances in Speech Language Models: A Survey
von: Cui, Wenqian, et al.
Veröffentlicht: (2024)

Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
von: Kuan, Chun-Yi, et al.
Veröffentlicht: (2024)

ESPnet-SpeechLM: An Open Speech Language Model Toolkit
von: Tian, Jinchuan, et al.
Veröffentlicht: (2025)