Saved in:
| Main Authors: | Cheng, Gaofeng, Lu, Haitian, Yang, Chengxu, Wang, Xuyang, Li, Ta, Yan, Yonghong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.00804 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition
by: Fu, Li, et al.
Published: (2025)
by: Fu, Li, et al.
Published: (2025)
SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation
by: Lu, Haitian, et al.
Published: (2025)
by: Lu, Haitian, et al.
Published: (2025)
Fine-Tuning Large Multimodal Models for Automatic Pronunciation Assessment
by: Wang, Ke, et al.
Published: (2025)
by: Wang, Ke, et al.
Published: (2025)
Zero-shot Context Biasing with Trie-based Decoding using Synthetic Multi-Pronunciation
by: Liu, Changsong, et al.
Published: (2025)
by: Liu, Changsong, et al.
Published: (2025)
Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
by: Shakeel, Muhammad, et al.
Published: (2024)
by: Shakeel, Muhammad, et al.
Published: (2024)
Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation
by: Huang, Ruizhe, et al.
Published: (2024)
by: Huang, Ruizhe, et al.
Published: (2024)
Transliterated Zero-Shot Domain Adaptation for Automatic Speech Recognition
by: Zhu, Han, et al.
Published: (2024)
by: Zhu, Han, et al.
Published: (2024)
Lightweight Prompt Biasing for Contextualized End-to-End ASR Systems
by: Ren, Bo, et al.
Published: (2025)
by: Ren, Bo, et al.
Published: (2025)
Improving ASR Contextual Biasing with Guided Attention
by: Tang, Jiyang, et al.
Published: (2024)
by: Tang, Jiyang, et al.
Published: (2024)
Pronunciation Assessment with Multi-modal Large Language Models
by: Fu, Kaiqi, et al.
Published: (2024)
by: Fu, Kaiqi, et al.
Published: (2024)
OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
by: Sudo, Yui, et al.
Published: (2025)
by: Sudo, Yui, et al.
Published: (2025)
Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
by: Xu, Hainan, et al.
Published: (2024)
by: Xu, Hainan, et al.
Published: (2024)
Enhancing the Robustness of Contextual ASR to Varying Biasing Information Volumes Through Purified Semantic Correlation Joint Modeling
by: Gu, Yue, et al.
Published: (2025)
by: Gu, Yue, et al.
Published: (2025)
Towards Unsupervised Speech Recognition Without Pronunciation Models
by: Ni, Junrui, et al.
Published: (2024)
by: Ni, Junrui, et al.
Published: (2024)
Exploring the Potential of Large Multimodal Models as Effective Alternatives for Pronunciation Assessment
by: Wang, Ke, et al.
Published: (2025)
by: Wang, Ke, et al.
Published: (2025)
Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator
by: Sun, Guangzhi, et al.
Published: (2022)
by: Sun, Guangzhi, et al.
Published: (2022)
WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing
by: Nakagome, Yu, et al.
Published: (2025)
by: Nakagome, Yu, et al.
Published: (2025)
UtterTune: LoRA-Based Target-Language Pronunciation Edit and Control in Multilingual Text-to-Speech
by: Kato, Shuhei
Published: (2025)
by: Kato, Shuhei
Published: (2025)
Text Injection for Neural Contextual Biasing
by: Meng, Zhong, et al.
Published: (2024)
by: Meng, Zhong, et al.
Published: (2024)
Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation
by: Lin, Zhennan, et al.
Published: (2025)
by: Lin, Zhennan, et al.
Published: (2025)
Segmentation-free Goodness of Pronunciation
by: Cao, Xinwei, et al.
Published: (2025)
by: Cao, Xinwei, et al.
Published: (2025)
Contextualized Automatic Speech Recognition with Dynamic Vocabulary
by: Sudo, Yui, et al.
Published: (2024)
by: Sudo, Yui, et al.
Published: (2024)
Streaming Non-Autoregressive Model for Accent Conversion and Pronunciation Improvement
by: Nguyen, Tuan-Nam, et al.
Published: (2025)
by: Nguyen, Tuan-Nam, et al.
Published: (2025)
Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation
by: Ellinas, Nikolaos, et al.
Published: (2022)
by: Ellinas, Nikolaos, et al.
Published: (2022)
A Neural Model for Contextual Biasing Score Learning and Filtering
by: Huang, Wanting, et al.
Published: (2025)
by: Huang, Wanting, et al.
Published: (2025)
Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning
by: Sun, Siqi, et al.
Published: (2024)
by: Sun, Siqi, et al.
Published: (2024)
Unveiling Biases while Embracing Sustainability: Assessing the Dual Challenges of Automatic Speech Recognition Systems
by: Kulkarni, Ajinkya, et al.
Published: (2025)
by: Kulkarni, Ajinkya, et al.
Published: (2025)
InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions
by: Nakagome, Yu, et al.
Published: (2024)
by: Nakagome, Yu, et al.
Published: (2024)
Towards Efficient and Multifaceted Computer-assisted Pronunciation Training Leveraging Hierarchical Selective State Space Model and Decoupled Cross-entropy Loss
by: Chao, Fu-An, et al.
Published: (2025)
by: Chao, Fu-An, et al.
Published: (2025)
Automatic Speech Recognition Biases in Newcastle English: an Error Analysis
by: Serditova, Dana, et al.
Published: (2025)
by: Serditova, Dana, et al.
Published: (2025)
ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization
by: Yan, Bi-Cheng, et al.
Published: (2024)
by: Yan, Bi-Cheng, et al.
Published: (2024)
Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search
by: Sudo, Yui, et al.
Published: (2024)
by: Sudo, Yui, et al.
Published: (2024)
MultiPA: A Multi-task Speech Pronunciation Assessment Model for Open Response Scenarios
by: Chen, Yu-Wen, et al.
Published: (2023)
by: Chen, Yu-Wen, et al.
Published: (2023)
Revisiting Interpolation Augmentation for Speech-to-Text Generation
by: Xu, Chen, et al.
Published: (2024)
by: Xu, Chen, et al.
Published: (2024)
Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech
by: Mujtaba, Dena, et al.
Published: (2024)
by: Mujtaba, Dena, et al.
Published: (2024)
K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function
by: Li, Shuhe, et al.
Published: (2025)
by: Li, Shuhe, et al.
Published: (2025)
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
by: Yang, Yifan, et al.
Published: (2025)
by: Yang, Yifan, et al.
Published: (2025)
Transducer Consistency Regularization for Speech to Text Applications
by: Tseng, Cindy, et al.
Published: (2024)
by: Tseng, Cindy, et al.
Published: (2024)
Automatic Pronunciation Error Detection and Correction of the Holy Quran's Learners Using Deep Learning
by: Abdelfattah, Abdullah, et al.
Published: (2025)
by: Abdelfattah, Abdullah, et al.
Published: (2025)
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
by: Wang, Siyin, et al.
Published: (2024)
by: Wang, Siyin, et al.
Published: (2024)
Similar Items
-
PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition
by: Fu, Li, et al.
Published: (2025) -
SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation
by: Lu, Haitian, et al.
Published: (2025) -
Fine-Tuning Large Multimodal Models for Automatic Pronunciation Assessment
by: Wang, Ke, et al.
Published: (2025) -
Zero-shot Context Biasing with Trie-based Decoding using Synthetic Multi-Pronunciation
by: Liu, Changsong, et al.
Published: (2025) -
Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
by: Shakeel, Muhammad, et al.
Published: (2024)