:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Guansu, Sun, Peijie
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Computation and Language
Online Access:	https://arxiv.org/abs/2511.17555
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis
by: Do, Cong-Thanh, et al.
Published: (2024)

Speech Recognition Rescoring with Large Speech-Text Foundation Models
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2024)

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024)

Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation
by: Huang, Ruizhe, et al.
Published: (2024)

TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis
by: Wang, Xi, et al.
Published: (2026)

Improving Whisper's Recognition Performance for Under-Represented Language Kazakh Leveraging Unpaired Speech and Text
by: Li, Jinpeng, et al.
Published: (2024)

Continuous Speech Tokenizer in Text To Speech
by: Li, Yixing, et al.
Published: (2024)

Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning
by: Chien, Chung-Ming, et al.
Published: (2024)

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
by: Peng, Yifan, et al.
Published: (2024)

Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
by: Zhou, Kun, et al.
Published: (2024)

Streaming Speech-to-Confusion Network Speech Recognition
by: Filimonov, Denis, et al.
Published: (2023)

Improving the Inclusivity of Dutch Speech Recognition by Fine-tuning Whisper on the JASMIN-CGN Corpus
by: Shekoufandeh, Golshid, et al.
Published: (2025)

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
by: Yang, Yifan, et al.
Published: (2025)

Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
by: Wu, Haibin, et al.
Published: (2025)

DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
by: Lu, Ke-Han, et al.
Published: (2024)

Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection
by: Lin, Hsi-Che, et al.
Published: (2024)

UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
by: Tu, Wenming, et al.
Published: (2025)

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
by: Wang, Hui, et al.
Published: (2025)

MTLM: Incorporating Bidirectional Text Information to Enhance Language Model Training in Speech Recognition Systems
by: Meng, Qingliang, et al.
Published: (2025)

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
by: Wang, Yujin, et al.
Published: (2022)

Speaker-Aware Simulation Improves Conversational Speech Recognition
by: Gedeon, Máté, et al.
Published: (2026)

MahaTTS: A Unified Framework for Multilingual Text-to-Speech Synthesis
by: Singh, Jaskaran, et al.
Published: (2025)

Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC
by: Wang, Qingzheng, et al.
Published: (2025)

An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis
by: Peng, Yifan, et al.
Published: (2024)

Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
by: Chen, Chen, et al.
Published: (2024)

Evaluating Speech-to-Text x LLM x Text-to-Speech Combinations for AI Interview Systems
by: Allbert, Rumi, et al.
Published: (2025)

Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
by: Fang, Yangui, et al.
Published: (2025)

Evaluation of LLMs in Speech is Often Flawed: Test Set Contamination in Large Language Models for Speech Recognition
by: Tseng, Yuan, et al.
Published: (2025)

SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
by: Du, Jiayu, et al.
Published: (2024)

Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
by: Futami, Hayato, et al.
Published: (2025)

Chain of Correction for Full-text Speech Recognition with Large Language Models
by: Tang, Zhiyuan, et al.
Published: (2025)

TagSpeech: End-to-End Multi-Speaker ASR and Diarization with Fine-Grained Temporal Grounding
by: Huo, Mingyue, et al.
Published: (2026)

On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
by: Rossenbach, Nick, et al.
Published: (2024)

Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
by: Hu, Yuchen, et al.
Published: (2024)

Multi-stage Large Language Model Correction for Speech Recognition
by: Pu, Jie, et al.
Published: (2023)

Revisiting Interpolation Augmentation for Speech-to-Text Generation
by: Xu, Chen, et al.
Published: (2024)

Full-text Error Correction for Chinese Speech Recognition with Large Language Model
by: Tang, Zhiyuan, et al.
Published: (2024)

Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
by: Xue, Jinlong, et al.
Published: (2024)

Position-invariant Fine-tuning of Speech Enhancement Models with Self-supervised Speech Representations
by: Meghanani, Amit, et al.
Published: (2026)

Towards Unsupervised Speech Recognition Without Pronunciation Models
by: Ni, Junrui, et al.
Published: (2024)