:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Meng, Qingliang, Ren, Pengju, Li, Tian, Dai, Changsong, Liang, Huizhi
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2502.10058
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

FNH-TTS: Mixture-of-Experts Duration Modeling for Robust Neural Speech Synthesis
by: Meng, Qingliang, et al.
Published: (2025)

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
by: Wang, Peng, et al.
Published: (2023)

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024)

DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
by: Lu, Ke-Han, et al.
Published: (2024)

Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
by: Wu, Haibin, et al.
Published: (2025)

Speech Recognition Rescoring with Large Speech-Text Foundation Models
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2024)

Speech Recognition Model Improves Text-to-Speech Synthesis using Fine-Grained Reward
by: Wang, Guansu, et al.
Published: (2025)

Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
by: Zhou, Kun, et al.
Published: (2024)

Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
by: Meng, Yangyang, et al.
Published: (2025)

Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
by: Koluguri, Nithin Rao, et al.
Published: (2024)

Sequential Editing for Lifelong Training of Speech Recognition Models
by: Kulshreshtha, Devang, et al.
Published: (2024)

Multi-stage Large Language Model Correction for Speech Recognition
by: Pu, Jie, et al.
Published: (2023)

Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT
by: Yamauchi, Kazuki, et al.
Published: (2024)

Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving
by: Xie, Jingran, et al.
Published: (2025)

Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
by: Futami, Hayato, et al.
Published: (2025)

Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech
by: Corrêa, Pedro, et al.
Published: (2025)

Chain of Correction for Full-text Speech Recognition with Large Language Models
by: Tang, Zhiyuan, et al.
Published: (2025)

Enhancing Large Language Model-based Speech Recognition by Contextualization for Rare and Ambiguous Words
by: Nozawa, Kento, et al.
Published: (2024)

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
by: Yang, Yifan, et al.
Published: (2025)

TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
by: Tseng, Liang-Hsuan, et al.
Published: (2025)

Evaluation of LLMs in Speech is Often Flawed: Test Set Contamination in Large Language Models for Speech Recognition
by: Tseng, Yuan, et al.
Published: (2025)

Evaluating Speech-to-Text x LLM x Text-to-Speech Combinations for AI Interview Systems
by: Allbert, Rumi, et al.
Published: (2025)

Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models
by: Wang, Qiongqiong, et al.
Published: (2025)

Investigating the Impact of Word Informativeness on Speech Emotion Recognition
by: Kakouros, Sofoklis
Published: (2025)

Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study
by: Min, Zeping, et al.
Published: (2023)

Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System
by: Meng, Lingwei, et al.
Published: (2024)

Full-text Error Correction for Chinese Speech Recognition with Large Language Model
by: Tang, Zhiyuan, et al.
Published: (2024)

Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models
by: Yusuf, Bolaji, et al.
Published: (2024)

Benchmarking Automatic Speech Recognition Models for African Languages
by: Nahabwe, Alvin, et al.
Published: (2025)

Customizing Speech Recognition Model with Large Language Model Feedback
by: Ling, Shaoshi, et al.
Published: (2025)

Enhancing Indonesian Automatic Speech Recognition: Evaluating Multilingual Models with Diverse Speech Variabilities
by: Adila, Aulia, et al.
Published: (2024)

Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought
by: Zhao, Zhixian, et al.
Published: (2025)

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
by: Dhawan, Kunal, et al.
Published: (2024)

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
by: Peng, Yifan, et al.
Published: (2024)

Improving Whisper's Recognition Performance for Under-Represented Language Kazakh Leveraging Unpaired Speech and Text
by: Li, Jinpeng, et al.
Published: (2024)

Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation
by: Huang, Ruizhe, et al.
Published: (2024)

PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition
by: Fu, Li, et al.
Published: (2025)

Scaling Analysis of Interleaved Speech-Text Language Models
by: Maimon, Gallil, et al.
Published: (2025)

Granary: Speech Recognition and Translation Dataset in 25 European Languages
by: Koluguri, Nithin Rao, et al.
Published: (2025)

Streaming Speech-to-Confusion Network Speech Recognition
by: Filimonov, Denis, et al.
Published: (2023)