Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Nethil, Kumarmanas, Mishra, Vaibhav, Anandan, Kriti, Manohar, Kavya
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Computation and Language Sound
Online Access:	https://arxiv.org/abs/2507.01021
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908536745754624
author	Nethil, Kumarmanas Mishra, Vaibhav Anandan, Kriti Manohar, Kavya
author_facet	Nethil, Kumarmanas Mishra, Vaibhav Anandan, Kriti Manohar, Kavya
contents	We propose an open-source framework for Command-style dictation that addresses the gap between resource-intensive Online systems and high-latency Batch processing. Our approach uses Voice Activity Detection (VAD) to segment audio and transcribes these segments in parallel using Whisper models, enabling efficient multiplexing across audios. Unlike proprietary systems like SuperWhisper, this framework is also compatible with most ASR architectures, including widely used CTC-based models. Our multiplexing technique maximizes compute utilization in real-world settings, as demonstrated by its deployment in around 15% of India's courtrooms. Evaluations on live data show consistent latency reduction as user concurrency increases, compared to sequential batch processing. The live demonstration will showcase our open-sourced implementation and allow attendees to interact with it in real-time.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_01021
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Scalable Offline ASR for Command-Style Dictation in Courtrooms Nethil, Kumarmanas Mishra, Vaibhav Anandan, Kriti Manohar, Kavya Audio and Speech Processing Computation and Language Sound We propose an open-source framework for Command-style dictation that addresses the gap between resource-intensive Online systems and high-latency Batch processing. Our approach uses Voice Activity Detection (VAD) to segment audio and transcribes these segments in parallel using Whisper models, enabling efficient multiplexing across audios. Unlike proprietary systems like SuperWhisper, this framework is also compatible with most ASR architectures, including widely used CTC-based models. Our multiplexing technique maximizes compute utilization in real-world settings, as demonstrated by its deployment in around 15% of India's courtrooms. Evaluations on live data show consistent latency reduction as user concurrency increases, compared to sequential batch processing. The live demonstration will showcase our open-sourced implementation and allow attendees to interact with it in real-time.
title	Scalable Offline ASR for Command-Style Dictation in Courtrooms
topic	Audio and Speech Processing Computation and Language Sound
url	https://arxiv.org/abs/2507.01021

Similar Items