Saved in:
| Main Authors: | Meng, Qingliang, Ren, Pengju, Li, Tian, Dai, Changsong, Liang, Huizhi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.10058 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FNH-TTS: Mixture-of-Experts Duration Modeling for Robust Neural Speech Synthesis
by: Meng, Qingliang, et al.
Published: (2025)
by: Meng, Qingliang, et al.
Published: (2025)
Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
by: Wang, Peng, et al.
Published: (2023)
by: Wang, Peng, et al.
Published: (2023)
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024)
by: Cornell, Samuele, et al.
Published: (2024)
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
by: Lu, Ke-Han, et al.
Published: (2024)
by: Lu, Ke-Han, et al.
Published: (2024)
Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
by: Wu, Haibin, et al.
Published: (2025)
by: Wu, Haibin, et al.
Published: (2025)
Speech Recognition Rescoring with Large Speech-Text Foundation Models
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2024)
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2024)
Speech Recognition Model Improves Text-to-Speech Synthesis using Fine-Grained Reward
by: Wang, Guansu, et al.
Published: (2025)
by: Wang, Guansu, et al.
Published: (2025)
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
by: Zhou, Kun, et al.
Published: (2024)
by: Zhou, Kun, et al.
Published: (2024)
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
by: Meng, Yangyang, et al.
Published: (2025)
by: Meng, Yangyang, et al.
Published: (2025)
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
by: Koluguri, Nithin Rao, et al.
Published: (2024)
by: Koluguri, Nithin Rao, et al.
Published: (2024)
Sequential Editing for Lifelong Training of Speech Recognition Models
by: Kulshreshtha, Devang, et al.
Published: (2024)
by: Kulshreshtha, Devang, et al.
Published: (2024)
Multi-stage Large Language Model Correction for Speech Recognition
by: Pu, Jie, et al.
Published: (2023)
by: Pu, Jie, et al.
Published: (2023)
Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT
by: Yamauchi, Kazuki, et al.
Published: (2024)
by: Yamauchi, Kazuki, et al.
Published: (2024)
Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving
by: Xie, Jingran, et al.
Published: (2025)
by: Xie, Jingran, et al.
Published: (2025)
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
by: Futami, Hayato, et al.
Published: (2025)
by: Futami, Hayato, et al.
Published: (2025)
Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech
by: Corrêa, Pedro, et al.
Published: (2025)
by: Corrêa, Pedro, et al.
Published: (2025)
Chain of Correction for Full-text Speech Recognition with Large Language Models
by: Tang, Zhiyuan, et al.
Published: (2025)
by: Tang, Zhiyuan, et al.
Published: (2025)
Enhancing Large Language Model-based Speech Recognition by Contextualization for Rare and Ambiguous Words
by: Nozawa, Kento, et al.
Published: (2024)
by: Nozawa, Kento, et al.
Published: (2024)
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
by: Yang, Yifan, et al.
Published: (2025)
by: Yang, Yifan, et al.
Published: (2025)
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
by: Tseng, Liang-Hsuan, et al.
Published: (2025)
by: Tseng, Liang-Hsuan, et al.
Published: (2025)
Evaluation of LLMs in Speech is Often Flawed: Test Set Contamination in Large Language Models for Speech Recognition
by: Tseng, Yuan, et al.
Published: (2025)
by: Tseng, Yuan, et al.
Published: (2025)
Evaluating Speech-to-Text x LLM x Text-to-Speech Combinations for AI Interview Systems
by: Allbert, Rumi, et al.
Published: (2025)
by: Allbert, Rumi, et al.
Published: (2025)
Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models
by: Wang, Qiongqiong, et al.
Published: (2025)
by: Wang, Qiongqiong, et al.
Published: (2025)
Investigating the Impact of Word Informativeness on Speech Emotion Recognition
by: Kakouros, Sofoklis
Published: (2025)
by: Kakouros, Sofoklis
Published: (2025)
Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study
by: Min, Zeping, et al.
Published: (2023)
by: Min, Zeping, et al.
Published: (2023)
Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System
by: Meng, Lingwei, et al.
Published: (2024)
by: Meng, Lingwei, et al.
Published: (2024)
Full-text Error Correction for Chinese Speech Recognition with Large Language Model
by: Tang, Zhiyuan, et al.
Published: (2024)
by: Tang, Zhiyuan, et al.
Published: (2024)
Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models
by: Yusuf, Bolaji, et al.
Published: (2024)
by: Yusuf, Bolaji, et al.
Published: (2024)
Benchmarking Automatic Speech Recognition Models for African Languages
by: Nahabwe, Alvin, et al.
Published: (2025)
by: Nahabwe, Alvin, et al.
Published: (2025)
Customizing Speech Recognition Model with Large Language Model Feedback
by: Ling, Shaoshi, et al.
Published: (2025)
by: Ling, Shaoshi, et al.
Published: (2025)
Enhancing Indonesian Automatic Speech Recognition: Evaluating Multilingual Models with Diverse Speech Variabilities
by: Adila, Aulia, et al.
Published: (2024)
by: Adila, Aulia, et al.
Published: (2024)
Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought
by: Zhao, Zhixian, et al.
Published: (2025)
by: Zhao, Zhixian, et al.
Published: (2025)
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
by: Dhawan, Kunal, et al.
Published: (2024)
by: Dhawan, Kunal, et al.
Published: (2024)
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
by: Peng, Yifan, et al.
Published: (2024)
by: Peng, Yifan, et al.
Published: (2024)
Improving Whisper's Recognition Performance for Under-Represented Language Kazakh Leveraging Unpaired Speech and Text
by: Li, Jinpeng, et al.
Published: (2024)
by: Li, Jinpeng, et al.
Published: (2024)
Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation
by: Huang, Ruizhe, et al.
Published: (2024)
by: Huang, Ruizhe, et al.
Published: (2024)
PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition
by: Fu, Li, et al.
Published: (2025)
by: Fu, Li, et al.
Published: (2025)
Scaling Analysis of Interleaved Speech-Text Language Models
by: Maimon, Gallil, et al.
Published: (2025)
by: Maimon, Gallil, et al.
Published: (2025)
Granary: Speech Recognition and Translation Dataset in 25 European Languages
by: Koluguri, Nithin Rao, et al.
Published: (2025)
by: Koluguri, Nithin Rao, et al.
Published: (2025)
Streaming Speech-to-Confusion Network Speech Recognition
by: Filimonov, Denis, et al.
Published: (2023)
by: Filimonov, Denis, et al.
Published: (2023)
Similar Items
-
FNH-TTS: Mixture-of-Experts Duration Modeling for Robust Neural Speech Synthesis
by: Meng, Qingliang, et al.
Published: (2025) -
Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
by: Wang, Peng, et al.
Published: (2023) -
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024) -
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
by: Lu, Ke-Han, et al.
Published: (2024) -
Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
by: Wu, Haibin, et al.
Published: (2025)