Saved in:
| Main Authors: | Wu, Ke, Variani, Ehsan, Bagby, Tom, Reddy, Shashir, Pilgrim, Rory |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.16555 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Scalable Offline ASR for Command-Style Dictation in Courtrooms
by: Nethil, Kumarmanas, et al.
Published: (2025)
by: Nethil, Kumarmanas, et al.
Published: (2025)
MDM-ASR: Bridging Accuracy and Efficiency in ASR with Diffusion-Based Non-Autoregressive Decoding
by: Yen, Hao, et al.
Published: (2026)
by: Yen, Hao, et al.
Published: (2026)
The Sound of Healthcare: Improving Medical Transcription ASR Accuracy with Large Language Models
by: Adedeji, Ayo, et al.
Published: (2024)
by: Adedeji, Ayo, et al.
Published: (2024)
Reverb: Open-Source ASR and Diarization from Rev
by: Bhandari, Nishchal, et al.
Published: (2024)
by: Bhandari, Nishchal, et al.
Published: (2024)
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
by: Xu, Kai-Tuo, et al.
Published: (2025)
by: Xu, Kai-Tuo, et al.
Published: (2025)
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
by: Geng, Xuelong, et al.
Published: (2024)
by: Geng, Xuelong, et al.
Published: (2024)
DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models
by: Li, Li, et al.
Published: (2026)
by: Li, Li, et al.
Published: (2026)
EffectiveASR: A Single-Step Non-Autoregressive Mandarin Speech Recognition Architecture with High Accuracy and Inference Speed
by: Zhuang, Ziyang, et al.
Published: (2024)
by: Zhuang, Ziyang, et al.
Published: (2024)
Doctor or Patient? Synergizing Diarization and ASR for Code-Switched Hinglish Medical Conditions Extraction
by: Baroudi, Séverin, et al.
Published: (2026)
by: Baroudi, Séverin, et al.
Published: (2026)
All-in-One ASR: Unifying Encoder-Decoder Models of CTC, Attention, and Transducer in Dual-Mode ASR
by: Moriya, Takafumi, et al.
Published: (2025)
by: Moriya, Takafumi, et al.
Published: (2025)
Performant ASR Models for Medical Entities in Accented Speech
by: Afonja, Tejumade, et al.
Published: (2024)
by: Afonja, Tejumade, et al.
Published: (2024)
DuRep: Dual-Mode Speech Representation Learning via ASR-Aware Distillation
by: Male, Prabash Reddy, et al.
Published: (2025)
by: Male, Prabash Reddy, et al.
Published: (2025)
ASR for Affective Speech: Investigating Impact of Emotion and Speech Generative Strategy
by: Wu, Ya-Tse, et al.
Published: (2026)
by: Wu, Ya-Tse, et al.
Published: (2026)
Index-ASR Technical Report
by: Song, Zheshu, et al.
Published: (2025)
by: Song, Zheshu, et al.
Published: (2025)
How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools
by: Răgman, Teodora, et al.
Published: (2026)
by: Răgman, Teodora, et al.
Published: (2026)
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models
by: Kumar, Shashi, et al.
Published: (2024)
by: Kumar, Shashi, et al.
Published: (2024)
Towards a Single ASR Model That Generalizes to Disordered Speech
by: Tobin, Jimmy, et al.
Published: (2024)
by: Tobin, Jimmy, et al.
Published: (2024)
Semi-supervised Learning for Code-Switching ASR with Large Language Model Filter
by: Xi, Yu, et al.
Published: (2024)
by: Xi, Yu, et al.
Published: (2024)
Inverse-Hessian Regularization for Continual Learning in ASR
by: Eeckt, Steven Vander, et al.
Published: (2026)
by: Eeckt, Steven Vander, et al.
Published: (2026)
LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation
by: Li, Shaojun, et al.
Published: (2024)
by: Li, Shaojun, et al.
Published: (2024)
The USTC-NERCSLIP Systems for The ICMC-ASR Challenge
by: Wu, Minghui, et al.
Published: (2024)
by: Wu, Minghui, et al.
Published: (2024)
Speaker Adaptation for Quantised End-to-End ASR Models
by: Zhao, Qiuming, et al.
Published: (2024)
by: Zhao, Qiuming, et al.
Published: (2024)
Open-Source System for Multilingual Translation and Cloned Speech Synthesis
by: Cámara, Mateo, et al.
Published: (2025)
by: Cámara, Mateo, et al.
Published: (2025)
SOT Triggered Neural Clustering for Speaker Attributed ASR
by: Zheng, Xianrui, et al.
Published: (2024)
by: Zheng, Xianrui, et al.
Published: (2024)
DNCASR: End-to-End Training for Speaker-Attributed ASR
by: Zheng, Xianrui, et al.
Published: (2025)
by: Zheng, Xianrui, et al.
Published: (2025)
An investigation of modularity for noise robustness in conformer-based ASR
by: de Gibson, Louise Coppieters, et al.
Published: (2024)
by: de Gibson, Louise Coppieters, et al.
Published: (2024)
Towards scalable efficient on-device ASR with transfer learning
by: Pandey, Laxmi, et al.
Published: (2024)
by: Pandey, Laxmi, et al.
Published: (2024)
Evaluation of Speech Foundation Models for ASR on Child-Adult Conversations in Autism Diagnostic Sessions
by: Ashvin, Aditya, et al.
Published: (2024)
by: Ashvin, Aditya, et al.
Published: (2024)
A Bottom-up Framework with Language-universal Speech Attribute Modeling for Syllable-based ASR
by: Yen, Hao, et al.
Published: (2025)
by: Yen, Hao, et al.
Published: (2025)
The Multicultural Medical Assistant: Can LLMs Improve Medical ASR Errors Across Borders?
by: Adedeji, Ayo, et al.
Published: (2025)
by: Adedeji, Ayo, et al.
Published: (2025)
Advancing Zero-Shot Open-Set Speech Deepfake Source Tracing
by: Chhibber, Manasi, et al.
Published: (2025)
by: Chhibber, Manasi, et al.
Published: (2025)
FLAMO: An Open-Source Library for Frequency-Domain Differentiable Audio Processing
by: Santo, Gloria Dal, et al.
Published: (2024)
by: Santo, Gloria Dal, et al.
Published: (2024)
Target Speaker ASR with Whisper
by: Polok, Alexander, et al.
Published: (2024)
by: Polok, Alexander, et al.
Published: (2024)
SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction
by: Tang, Yuxun, et al.
Published: (2024)
by: Tang, Yuxun, et al.
Published: (2024)
NLE: Non-autoregressive LLM-based ASR by Transcript Editing
by: Dekel, Avihu, et al.
Published: (2026)
by: Dekel, Avihu, et al.
Published: (2026)
The THUEE System Description for the IARPA OpenASR21 Challenge
by: Zhao, Jing, et al.
Published: (2022)
by: Zhao, Jing, et al.
Published: (2022)
Auto-Landmark: Acoustic Landmark Dataset and Open-Source Toolkit for Landmark Extraction
by: Zhang, Xiangyu, et al.
Published: (2024)
by: Zhang, Xiangyu, et al.
Published: (2024)
Contextual Biasing for Streaming ASR via CTC-based Word Spotting
by: Tsai, Kai-Chen, et al.
Published: (2026)
by: Tsai, Kai-Chen, et al.
Published: (2026)
Self-Speculative Decoding for LLM-based ASR with CTC Encoder Drafts
by: Saon, George, et al.
Published: (2026)
by: Saon, George, et al.
Published: (2026)
Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
by: Wang, Weiqing, et al.
Published: (2024)
by: Wang, Weiqing, et al.
Published: (2024)
Similar Items
-
Scalable Offline ASR for Command-Style Dictation in Courtrooms
by: Nethil, Kumarmanas, et al.
Published: (2025) -
MDM-ASR: Bridging Accuracy and Efficiency in ASR with Diffusion-Based Non-Autoregressive Decoding
by: Yen, Hao, et al.
Published: (2026) -
The Sound of Healthcare: Improving Medical Transcription ASR Accuracy with Large Language Models
by: Adedeji, Ayo, et al.
Published: (2024) -
Reverb: Open-Source ASR and Diarization from Rev
by: Bhandari, Nishchal, et al.
Published: (2024) -
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
by: Xu, Kai-Tuo, et al.
Published: (2025)