:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lu, Junyu, Jiang, Di, Hong, Mengze, Wei, Victor Junqiu, Guo, Qintian, Su, Zhiyang
Format:	Preprint
Published:	2025
Subjects:	Sound Computation and Language
Online Access:	https://arxiv.org/abs/2509.04393
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction
by: Wei, Victor Junqiu, et al.
Published: (2024)

Acoustic Model Optimization over Multiple Data Sources: Merging and Valuation
by: Wei, Victor Junqiu, et al.
Published: (2024)

High-precision Voice Search Query Correction via Retrievable Speech-text Embedings
by: Li, Christopher, et al.
Published: (2024)

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
by: Song, Yuhan, et al.
Published: (2025)

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
by: Wei, Kun, et al.
Published: (2023)

Elderly-Contextual Data Augmentation via Speech Synthesis for Elderly ASR
by: Lee, Minsik, et al.
Published: (2026)

SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
by: Zhang, Xin, et al.
Published: (2023)

Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search
by: Sudo, Yui, et al.
Published: (2024)

UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction
by: Guo, Jiaxin, et al.
Published: (2024)

Continuous Speech Tokenizer in Text To Speech
by: Li, Yixing, et al.
Published: (2024)

Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs
by: Wang, Dingdong, et al.
Published: (2025)

Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation
by: Lin, Zhennan, et al.
Published: (2025)

ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
by: Wang, He, et al.
Published: (2025)

Frontend Token Enhancement for Token-Based Speech Recognition
by: Ashihara, Takanori, et al.
Published: (2026)

CLARITY: Contextual Linguistic Adaptation and Accent Retrieval for Dual-Bias Mitigation in Text-to-Speech Generation
by: Poon, Crystal Min Hui, et al.
Published: (2025)

DELULU: Discriminative Embedding Learning Using Latent Units for Speaker-Aware Self-Trained Speech Foundational Model
by: Baali, Massa, et al.
Published: (2025)

Contextualized Automatic Speech Recognition with Dynamic Vocabulary
by: Sudo, Yui, et al.
Published: (2024)

Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition
by: Hong, Mengze, et al.
Published: (2026)

Next Tokens Denoising for Speech Synthesis
by: Liu, Yanqing, et al.
Published: (2025)

STAB: Speech Tokenizer Assessment Benchmark
by: Vashishth, Shikhar, et al.
Published: (2024)

TASTE-Streaming: Towards Streamable Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
by: Tseng, Liang-Hsuan, et al.
Published: (2026)

POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
by: Li, Xuanchen, et al.
Published: (2025)

Technical Report: A Practical Guide to Kaldi ASR Optimization
by: Hong, Mengze, et al.
Published: (2025)

Factorized RVQ-GAN For Disentangled Speech Tokenization
by: Khurana, Sameer, et al.
Published: (2025)

Benchmarking Prosody Encoding in Discrete Speech Tokens
by: Onda, Kentaro, et al.
Published: (2025)

Contextual Earnings-22: A Speech Recognition Benchmark with Custom Vocabulary in the Wild
by: Durmus, Berkin, et al.
Published: (2026)

LAST: Language Model Aware Speech Tokenization
by: Turetzky, Arnon, et al.
Published: (2024)

Semantic Codebooks as Effective Priors for Neural Speech Compression
by: Bai, Liuyang, et al.
Published: (2025)

Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models
by: Kando, Shunsuke, et al.
Published: (2025)

OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
by: Sudo, Yui, et al.
Published: (2025)

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
by: Wang, Hui, et al.
Published: (2025)

DYNAC: Dynamic Vocabulary based Non-Autoregressive Contextualization for Speech Recognition
by: Sudo, Yui, et al.
Published: (2025)

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
by: Cui, Mingyu, et al.
Published: (2024)

Improving Speech-based Emotion Recognition with Contextual Utterance Analysis and LLMs
by: Zhang, Enshi, et al.
Published: (2024)

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
by: Shakeel, Muhammad, et al.
Published: (2024)

Representing Speech Through Autoregressive Prediction of Cochlear Tokens
by: Tuckute, Greta, et al.
Published: (2025)

Rethinking Discrete Speech Representation Tokens for Accent Generation
by: Zhong, Jinzuomu, et al.
Published: (2026)

Children's Speech Recognition through Discrete Token Enhancement
by: Sukhadia, Vrunda N., et al.
Published: (2024)

Beyond Prompting: Efficient and Robust Contextual Biasing for Speech LLMs via Logit-Space Integration (LOGIC)
by: Wang, Peidong
Published: (2026)

PAST: Phonetic-Acoustic Speech Tokenizer
by: Har-Tuv, Nadav, et al.
Published: (2025)