:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wu, Ke, Variani, Ehsan, Bagby, Tom, Reddy, Shashir, Pilgrim, Rory
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2605.16555
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Scalable Offline ASR for Command-Style Dictation in Courtrooms
by: Nethil, Kumarmanas, et al.
Published: (2025)

MDM-ASR: Bridging Accuracy and Efficiency in ASR with Diffusion-Based Non-Autoregressive Decoding
by: Yen, Hao, et al.
Published: (2026)

The Sound of Healthcare: Improving Medical Transcription ASR Accuracy with Large Language Models
by: Adedeji, Ayo, et al.
Published: (2024)

Reverb: Open-Source ASR and Diarization from Rev
by: Bhandari, Nishchal, et al.
Published: (2024)

FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
by: Xu, Kai-Tuo, et al.
Published: (2025)

Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
by: Geng, Xuelong, et al.
Published: (2024)

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models
by: Li, Li, et al.
Published: (2026)

EffectiveASR: A Single-Step Non-Autoregressive Mandarin Speech Recognition Architecture with High Accuracy and Inference Speed
by: Zhuang, Ziyang, et al.
Published: (2024)

Doctor or Patient? Synergizing Diarization and ASR for Code-Switched Hinglish Medical Conditions Extraction
by: Baroudi, Séverin, et al.
Published: (2026)

All-in-One ASR: Unifying Encoder-Decoder Models of CTC, Attention, and Transducer in Dual-Mode ASR
by: Moriya, Takafumi, et al.
Published: (2025)

Performant ASR Models for Medical Entities in Accented Speech
by: Afonja, Tejumade, et al.
Published: (2024)

DuRep: Dual-Mode Speech Representation Learning via ASR-Aware Distillation
by: Male, Prabash Reddy, et al.
Published: (2025)

ASR for Affective Speech: Investigating Impact of Emotion and Speech Generative Strategy
by: Wu, Ya-Tse, et al.
Published: (2026)

Index-ASR Technical Report
by: Song, Zheshu, et al.
Published: (2025)

How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools
by: Răgman, Teodora, et al.
Published: (2026)

XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models
by: Kumar, Shashi, et al.
Published: (2024)

Towards a Single ASR Model That Generalizes to Disordered Speech
by: Tobin, Jimmy, et al.
Published: (2024)

Semi-supervised Learning for Code-Switching ASR with Large Language Model Filter
by: Xi, Yu, et al.
Published: (2024)

Inverse-Hessian Regularization for Continual Learning in ASR
by: Eeckt, Steven Vander, et al.
Published: (2026)

LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation
by: Li, Shaojun, et al.
Published: (2024)

The USTC-NERCSLIP Systems for The ICMC-ASR Challenge
by: Wu, Minghui, et al.
Published: (2024)

Speaker Adaptation for Quantised End-to-End ASR Models
by: Zhao, Qiuming, et al.
Published: (2024)

Open-Source System for Multilingual Translation and Cloned Speech Synthesis
by: Cámara, Mateo, et al.
Published: (2025)

SOT Triggered Neural Clustering for Speaker Attributed ASR
by: Zheng, Xianrui, et al.
Published: (2024)

DNCASR: End-to-End Training for Speaker-Attributed ASR
by: Zheng, Xianrui, et al.
Published: (2025)

An investigation of modularity for noise robustness in conformer-based ASR
by: de Gibson, Louise Coppieters, et al.
Published: (2024)

Towards scalable efficient on-device ASR with transfer learning
by: Pandey, Laxmi, et al.
Published: (2024)

Evaluation of Speech Foundation Models for ASR on Child-Adult Conversations in Autism Diagnostic Sessions
by: Ashvin, Aditya, et al.
Published: (2024)

A Bottom-up Framework with Language-universal Speech Attribute Modeling for Syllable-based ASR
by: Yen, Hao, et al.
Published: (2025)

The Multicultural Medical Assistant: Can LLMs Improve Medical ASR Errors Across Borders?
by: Adedeji, Ayo, et al.
Published: (2025)

Advancing Zero-Shot Open-Set Speech Deepfake Source Tracing
by: Chhibber, Manasi, et al.
Published: (2025)

FLAMO: An Open-Source Library for Frequency-Domain Differentiable Audio Processing
by: Santo, Gloria Dal, et al.
Published: (2024)

Target Speaker ASR with Whisper
by: Polok, Alexander, et al.
Published: (2024)

SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction
by: Tang, Yuxun, et al.
Published: (2024)

NLE: Non-autoregressive LLM-based ASR by Transcript Editing
by: Dekel, Avihu, et al.
Published: (2026)

The THUEE System Description for the IARPA OpenASR21 Challenge
by: Zhao, Jing, et al.
Published: (2022)

Auto-Landmark: Acoustic Landmark Dataset and Open-Source Toolkit for Landmark Extraction
by: Zhang, Xiangyu, et al.
Published: (2024)

Contextual Biasing for Streaming ASR via CTC-based Word Spotting
by: Tsai, Kai-Chen, et al.
Published: (2026)

Self-Speculative Decoding for LLM-based ASR with CTC Encoder Drafts
by: Saon, George, et al.
Published: (2026)

Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
by: Wang, Weiqing, et al.
Published: (2024)