:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pandey, Laxmi, Li, Ke, Guo, Jinxi, Paul, Debjyoti, Guo, Arthur, Mahadeokar, Jay, Zhang, Xuedong
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2407.16664
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
by: Xie, Jiamin, et al.
Published: (2023)

A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data
by: Tran, Minh, et al.
Published: (2025)

Effective internal language model training and fusion for factorized transducer model
by: Guo, Jinxi, et al.
Published: (2024)

A light-weight and efficient punctuation and word casing prediction model for on-device streaming ASR
by: You, Jian, et al.
Published: (2024)

A Parameter-efficient Language Extension Framework for Multilingual ASR
by: Liu, Wei, et al.
Published: (2024)

PromptASR for contextualized ASR with controllable style
by: Yang, Xiaoyu, et al.
Published: (2023)

Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
by: Kang, Wonjune, et al.
Published: (2024)

MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables
by: Yeh, Sung-Lin, et al.
Published: (2026)

Qwen3-ASR Technical Report
by: Shi, Xian, et al.
Published: (2026)

WhisperKit: On-device Real-time ASR with Billion-Scale Transformers
by: Orhon, Atila, et al.
Published: (2025)

Transducer-Llama: Integrating LLMs into Streamable Transducer-based Speech Recognition
by: Deng, Keqi, et al.
Published: (2024)

Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper
by: Xu, Tianyi, et al.
Published: (2024)

NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR
by: Xie, Yuan, et al.
Published: (2026)

Pruning as Regularization: Sensitivity-Aware One-Shot Pruning in ASR
by: Irigoyen, Julian, et al.
Published: (2025)

LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation
by: Li, Shaojun, et al.
Published: (2024)

Quantizing Whisper-small: How design choices affect ASR performance
by: Söhler, Arthur, et al.
Published: (2025)

Towards interfacing large language models with ASR systems using confidence measures and prompting
by: Naderi, Maryam, et al.
Published: (2024)

CJST: CTC Compressor based Joint Speech and Text Training for Decoder-Only ASR
by: Zhou, Wei, et al.
Published: (2024)

Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks
by: Everson, Kevin, et al.
Published: (2024)

The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties
by: Chen, William, et al.
Published: (2025)

ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
by: Wang, He, et al.
Published: (2025)

Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
by: Geng, Xuelong, et al.
Published: (2024)

Can Speech LLMs Think while Listening?
by: Shih, Yi-Jen, et al.
Published: (2025)

Retrieval Augmented Generation based context discovery for ASR
by: Siskos, Dimitrios, et al.
Published: (2025)

Lightweight Prompt Biasing for Contextualized End-to-End ASR Systems
by: Ren, Bo, et al.
Published: (2025)

HypR: A comprehensive study for ASR hypothesis revising with a reference corpus
by: Wang, Yi-Wei, et al.
Published: (2023)

Efficient Streaming LLM for Speech Recognition
by: Jia, Junteng, et al.
Published: (2024)

Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR
by: Khassanov, Yerbolat, et al.
Published: (2024)

Improving ASR Contextual Biasing with Guided Attention
by: Tang, Jiyang, et al.
Published: (2024)

MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
by: Nguyen, Thai-Binh, et al.
Published: (2024)

Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages
by: Li, Chin-Jou, et al.
Published: (2025)

Mamba for Streaming ASR Combined with Unimodal Aggregation
by: Fang, Ying, et al.
Published: (2024)

Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
by: Fang, Yangui, et al.
Published: (2025)

Robust ASR Error Correction with Conservative Data Filtering
by: Udagawa, Takuma, et al.
Published: (2024)

Building English ASR model with regional language support
by: Agrawal, Purvi, et al.
Published: (2025)

Optimizing Byte-level Representation for End-to-end ASR
by: Hsiao, Roger, et al.
Published: (2024)

OCR-Enhanced Multimodal ASR Can Read While Listening
by: Chen, Junli, et al.
Published: (2026)

AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost
by: Gündüz, Ahmet, et al.
Published: (2024)

TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR
by: Kumar, Shashi, et al.
Published: (2024)

TagSpeech: End-to-End Multi-Speaker ASR and Diarization with Fine-Grained Temporal Grounding
by: Huo, Mingyue, et al.
Published: (2026)