Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Sivasubramaniam, Nithyashree
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.04507
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915480104599552
author	Sivasubramaniam, Nithyashree
author_facet	Sivasubramaniam, Nithyashree
contents	Silent Speech Interfaces (SSIs) have gained attention for their ability to generate intelligible speech from non-acoustic signals. While significant progress has been made in advancing speech generation pipelines, limited work has addressed the recognition and downstream processing of synthesized speech, which often suffers from phonetic ambiguity and noise. To overcome these challenges, we propose an enhanced automatic speech recognition framework that combines a transformer-based acoustic model with a large language model (LLM) for post-processing. The transformer captures full utterance context, while the LLM ensures linguistic consistency. Experimental results show a 16% relative and 6% absolute reduction in word error rate (WER) over a 36% baseline, demonstrating substantial improvements in intelligibility for silent speech interfaces.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_04507
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	From Silent Signals to Natural Language: A Dual-Stage Transformer-LLM Approach Sivasubramaniam, Nithyashree Computation and Language Artificial Intelligence Silent Speech Interfaces (SSIs) have gained attention for their ability to generate intelligible speech from non-acoustic signals. While significant progress has been made in advancing speech generation pipelines, limited work has addressed the recognition and downstream processing of synthesized speech, which often suffers from phonetic ambiguity and noise. To overcome these challenges, we propose an enhanced automatic speech recognition framework that combines a transformer-based acoustic model with a large language model (LLM) for post-processing. The transformer captures full utterance context, while the LLM ensures linguistic consistency. Experimental results show a 16% relative and 6% absolute reduction in word error rate (WER) over a 36% baseline, demonstrating substantial improvements in intelligibility for silent speech interfaces.
title	From Silent Signals to Natural Language: A Dual-Stage Transformer-LLM Approach
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2509.04507

Similar Items