Saved in:
| Main Authors: | Pandey, Laxmi, Li, Ke, Guo, Jinxi, Paul, Debjyoti, Guo, Arthur, Mahadeokar, Jay, Zhang, Xuedong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.16664 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
by: Xie, Jiamin, et al.
Published: (2023)
by: Xie, Jiamin, et al.
Published: (2023)
A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data
by: Tran, Minh, et al.
Published: (2025)
by: Tran, Minh, et al.
Published: (2025)
Effective internal language model training and fusion for factorized transducer model
by: Guo, Jinxi, et al.
Published: (2024)
by: Guo, Jinxi, et al.
Published: (2024)
A light-weight and efficient punctuation and word casing prediction model for on-device streaming ASR
by: You, Jian, et al.
Published: (2024)
by: You, Jian, et al.
Published: (2024)
A Parameter-efficient Language Extension Framework for Multilingual ASR
by: Liu, Wei, et al.
Published: (2024)
by: Liu, Wei, et al.
Published: (2024)
PromptASR for contextualized ASR with controllable style
by: Yang, Xiaoyu, et al.
Published: (2023)
by: Yang, Xiaoyu, et al.
Published: (2023)
Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
by: Kang, Wonjune, et al.
Published: (2024)
by: Kang, Wonjune, et al.
Published: (2024)
MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables
by: Yeh, Sung-Lin, et al.
Published: (2026)
by: Yeh, Sung-Lin, et al.
Published: (2026)
Qwen3-ASR Technical Report
by: Shi, Xian, et al.
Published: (2026)
by: Shi, Xian, et al.
Published: (2026)
WhisperKit: On-device Real-time ASR with Billion-Scale Transformers
by: Orhon, Atila, et al.
Published: (2025)
by: Orhon, Atila, et al.
Published: (2025)
Transducer-Llama: Integrating LLMs into Streamable Transducer-based Speech Recognition
by: Deng, Keqi, et al.
Published: (2024)
by: Deng, Keqi, et al.
Published: (2024)
Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper
by: Xu, Tianyi, et al.
Published: (2024)
by: Xu, Tianyi, et al.
Published: (2024)
NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR
by: Xie, Yuan, et al.
Published: (2026)
by: Xie, Yuan, et al.
Published: (2026)
Pruning as Regularization: Sensitivity-Aware One-Shot Pruning in ASR
by: Irigoyen, Julian, et al.
Published: (2025)
by: Irigoyen, Julian, et al.
Published: (2025)
LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation
by: Li, Shaojun, et al.
Published: (2024)
by: Li, Shaojun, et al.
Published: (2024)
Quantizing Whisper-small: How design choices affect ASR performance
by: Söhler, Arthur, et al.
Published: (2025)
by: Söhler, Arthur, et al.
Published: (2025)
Towards interfacing large language models with ASR systems using confidence measures and prompting
by: Naderi, Maryam, et al.
Published: (2024)
by: Naderi, Maryam, et al.
Published: (2024)
CJST: CTC Compressor based Joint Speech and Text Training for Decoder-Only ASR
by: Zhou, Wei, et al.
Published: (2024)
by: Zhou, Wei, et al.
Published: (2024)
Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks
by: Everson, Kevin, et al.
Published: (2024)
by: Everson, Kevin, et al.
Published: (2024)
The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties
by: Chen, William, et al.
Published: (2025)
by: Chen, William, et al.
Published: (2025)
ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
by: Wang, He, et al.
Published: (2025)
by: Wang, He, et al.
Published: (2025)
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
by: Geng, Xuelong, et al.
Published: (2024)
by: Geng, Xuelong, et al.
Published: (2024)
Can Speech LLMs Think while Listening?
by: Shih, Yi-Jen, et al.
Published: (2025)
by: Shih, Yi-Jen, et al.
Published: (2025)
Retrieval Augmented Generation based context discovery for ASR
by: Siskos, Dimitrios, et al.
Published: (2025)
by: Siskos, Dimitrios, et al.
Published: (2025)
Lightweight Prompt Biasing for Contextualized End-to-End ASR Systems
by: Ren, Bo, et al.
Published: (2025)
by: Ren, Bo, et al.
Published: (2025)
HypR: A comprehensive study for ASR hypothesis revising with a reference corpus
by: Wang, Yi-Wei, et al.
Published: (2023)
by: Wang, Yi-Wei, et al.
Published: (2023)
Efficient Streaming LLM for Speech Recognition
by: Jia, Junteng, et al.
Published: (2024)
by: Jia, Junteng, et al.
Published: (2024)
Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR
by: Khassanov, Yerbolat, et al.
Published: (2024)
by: Khassanov, Yerbolat, et al.
Published: (2024)
Improving ASR Contextual Biasing with Guided Attention
by: Tang, Jiyang, et al.
Published: (2024)
by: Tang, Jiyang, et al.
Published: (2024)
MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
by: Nguyen, Thai-Binh, et al.
Published: (2024)
by: Nguyen, Thai-Binh, et al.
Published: (2024)
Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages
by: Li, Chin-Jou, et al.
Published: (2025)
by: Li, Chin-Jou, et al.
Published: (2025)
Mamba for Streaming ASR Combined with Unimodal Aggregation
by: Fang, Ying, et al.
Published: (2024)
by: Fang, Ying, et al.
Published: (2024)
Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
by: Fang, Yangui, et al.
Published: (2025)
by: Fang, Yangui, et al.
Published: (2025)
Robust ASR Error Correction with Conservative Data Filtering
by: Udagawa, Takuma, et al.
Published: (2024)
by: Udagawa, Takuma, et al.
Published: (2024)
Building English ASR model with regional language support
by: Agrawal, Purvi, et al.
Published: (2025)
by: Agrawal, Purvi, et al.
Published: (2025)
Optimizing Byte-level Representation for End-to-end ASR
by: Hsiao, Roger, et al.
Published: (2024)
by: Hsiao, Roger, et al.
Published: (2024)
OCR-Enhanced Multimodal ASR Can Read While Listening
by: Chen, Junli, et al.
Published: (2026)
by: Chen, Junli, et al.
Published: (2026)
AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost
by: Gündüz, Ahmet, et al.
Published: (2024)
by: Gündüz, Ahmet, et al.
Published: (2024)
TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR
by: Kumar, Shashi, et al.
Published: (2024)
by: Kumar, Shashi, et al.
Published: (2024)
TagSpeech: End-to-End Multi-Speaker ASR and Diarization with Fine-Grained Temporal Grounding
by: Huo, Mingyue, et al.
Published: (2026)
by: Huo, Mingyue, et al.
Published: (2026)
Similar Items
-
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
by: Xie, Jiamin, et al.
Published: (2023) -
A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data
by: Tran, Minh, et al.
Published: (2025) -
Effective internal language model training and fusion for factorized transducer model
by: Guo, Jinxi, et al.
Published: (2024) -
A light-weight and efficient punctuation and word casing prediction model for on-device streaming ASR
by: You, Jian, et al.
Published: (2024) -
A Parameter-efficient Language Extension Framework for Multilingual ASR
by: Liu, Wei, et al.
Published: (2024)