Saved in:
| Main Authors: | Lu, Junyu, Jiang, Di, Hong, Mengze, Wei, Victor Junqiu, Guo, Qintian, Su, Zhiyang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.04393 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction
by: Wei, Victor Junqiu, et al.
Published: (2024)
by: Wei, Victor Junqiu, et al.
Published: (2024)
Acoustic Model Optimization over Multiple Data Sources: Merging and Valuation
by: Wei, Victor Junqiu, et al.
Published: (2024)
by: Wei, Victor Junqiu, et al.
Published: (2024)
High-precision Voice Search Query Correction via Retrievable Speech-text Embedings
by: Li, Christopher, et al.
Published: (2024)
by: Li, Christopher, et al.
Published: (2024)
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
by: Song, Yuhan, et al.
Published: (2025)
by: Song, Yuhan, et al.
Published: (2025)
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
by: Wei, Kun, et al.
Published: (2023)
by: Wei, Kun, et al.
Published: (2023)
Elderly-Contextual Data Augmentation via Speech Synthesis for Elderly ASR
by: Lee, Minsik, et al.
Published: (2026)
by: Lee, Minsik, et al.
Published: (2026)
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
by: Zhang, Xin, et al.
Published: (2023)
by: Zhang, Xin, et al.
Published: (2023)
Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search
by: Sudo, Yui, et al.
Published: (2024)
by: Sudo, Yui, et al.
Published: (2024)
UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction
by: Guo, Jiaxin, et al.
Published: (2024)
by: Guo, Jiaxin, et al.
Published: (2024)
Continuous Speech Tokenizer in Text To Speech
by: Li, Yixing, et al.
Published: (2024)
by: Li, Yixing, et al.
Published: (2024)
Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs
by: Wang, Dingdong, et al.
Published: (2025)
by: Wang, Dingdong, et al.
Published: (2025)
Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation
by: Lin, Zhennan, et al.
Published: (2025)
by: Lin, Zhennan, et al.
Published: (2025)
ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
by: Wang, He, et al.
Published: (2025)
by: Wang, He, et al.
Published: (2025)
Frontend Token Enhancement for Token-Based Speech Recognition
by: Ashihara, Takanori, et al.
Published: (2026)
by: Ashihara, Takanori, et al.
Published: (2026)
CLARITY: Contextual Linguistic Adaptation and Accent Retrieval for Dual-Bias Mitigation in Text-to-Speech Generation
by: Poon, Crystal Min Hui, et al.
Published: (2025)
by: Poon, Crystal Min Hui, et al.
Published: (2025)
DELULU: Discriminative Embedding Learning Using Latent Units for Speaker-Aware Self-Trained Speech Foundational Model
by: Baali, Massa, et al.
Published: (2025)
by: Baali, Massa, et al.
Published: (2025)
Contextualized Automatic Speech Recognition with Dynamic Vocabulary
by: Sudo, Yui, et al.
Published: (2024)
by: Sudo, Yui, et al.
Published: (2024)
Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition
by: Hong, Mengze, et al.
Published: (2026)
by: Hong, Mengze, et al.
Published: (2026)
Next Tokens Denoising for Speech Synthesis
by: Liu, Yanqing, et al.
Published: (2025)
by: Liu, Yanqing, et al.
Published: (2025)
STAB: Speech Tokenizer Assessment Benchmark
by: Vashishth, Shikhar, et al.
Published: (2024)
by: Vashishth, Shikhar, et al.
Published: (2024)
TASTE-Streaming: Towards Streamable Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
by: Tseng, Liang-Hsuan, et al.
Published: (2026)
by: Tseng, Liang-Hsuan, et al.
Published: (2026)
POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
by: Li, Xuanchen, et al.
Published: (2025)
by: Li, Xuanchen, et al.
Published: (2025)
Technical Report: A Practical Guide to Kaldi ASR Optimization
by: Hong, Mengze, et al.
Published: (2025)
by: Hong, Mengze, et al.
Published: (2025)
Factorized RVQ-GAN For Disentangled Speech Tokenization
by: Khurana, Sameer, et al.
Published: (2025)
by: Khurana, Sameer, et al.
Published: (2025)
Benchmarking Prosody Encoding in Discrete Speech Tokens
by: Onda, Kentaro, et al.
Published: (2025)
by: Onda, Kentaro, et al.
Published: (2025)
Contextual Earnings-22: A Speech Recognition Benchmark with Custom Vocabulary in the Wild
by: Durmus, Berkin, et al.
Published: (2026)
by: Durmus, Berkin, et al.
Published: (2026)
LAST: Language Model Aware Speech Tokenization
by: Turetzky, Arnon, et al.
Published: (2024)
by: Turetzky, Arnon, et al.
Published: (2024)
Semantic Codebooks as Effective Priors for Neural Speech Compression
by: Bai, Liuyang, et al.
Published: (2025)
by: Bai, Liuyang, et al.
Published: (2025)
Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models
by: Kando, Shunsuke, et al.
Published: (2025)
by: Kando, Shunsuke, et al.
Published: (2025)
OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
by: Sudo, Yui, et al.
Published: (2025)
by: Sudo, Yui, et al.
Published: (2025)
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
DYNAC: Dynamic Vocabulary based Non-Autoregressive Contextualization for Speech Recognition
by: Sudo, Yui, et al.
Published: (2025)
by: Sudo, Yui, et al.
Published: (2025)
Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
by: Cui, Mingyu, et al.
Published: (2024)
by: Cui, Mingyu, et al.
Published: (2024)
Improving Speech-based Emotion Recognition with Contextual Utterance Analysis and LLMs
by: Zhang, Enshi, et al.
Published: (2024)
by: Zhang, Enshi, et al.
Published: (2024)
Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
by: Shakeel, Muhammad, et al.
Published: (2024)
by: Shakeel, Muhammad, et al.
Published: (2024)
Representing Speech Through Autoregressive Prediction of Cochlear Tokens
by: Tuckute, Greta, et al.
Published: (2025)
by: Tuckute, Greta, et al.
Published: (2025)
Rethinking Discrete Speech Representation Tokens for Accent Generation
by: Zhong, Jinzuomu, et al.
Published: (2026)
by: Zhong, Jinzuomu, et al.
Published: (2026)
Children's Speech Recognition through Discrete Token Enhancement
by: Sukhadia, Vrunda N., et al.
Published: (2024)
by: Sukhadia, Vrunda N., et al.
Published: (2024)
Beyond Prompting: Efficient and Robust Contextual Biasing for Speech LLMs via Logit-Space Integration (LOGIC)
by: Wang, Peidong
Published: (2026)
by: Wang, Peidong
Published: (2026)
PAST: Phonetic-Acoustic Speech Tokenizer
by: Har-Tuv, Nadav, et al.
Published: (2025)
by: Har-Tuv, Nadav, et al.
Published: (2025)
Similar Items
-
ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction
by: Wei, Victor Junqiu, et al.
Published: (2024) -
Acoustic Model Optimization over Multiple Data Sources: Merging and Valuation
by: Wei, Victor Junqiu, et al.
Published: (2024) -
High-precision Voice Search Query Correction via Retrievable Speech-text Embedings
by: Li, Christopher, et al.
Published: (2024) -
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
by: Song, Yuhan, et al.
Published: (2025) -
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
by: Wei, Kun, et al.
Published: (2023)