Saved in:
Bibliographic Details
Main Author: Sivasubramaniam, Nithyashree
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.04507
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915480104599552
author Sivasubramaniam, Nithyashree
author_facet Sivasubramaniam, Nithyashree
contents Silent Speech Interfaces (SSIs) have gained attention for their ability to generate intelligible speech from non-acoustic signals. While significant progress has been made in advancing speech generation pipelines, limited work has addressed the recognition and downstream processing of synthesized speech, which often suffers from phonetic ambiguity and noise. To overcome these challenges, we propose an enhanced automatic speech recognition framework that combines a transformer-based acoustic model with a large language model (LLM) for post-processing. The transformer captures full utterance context, while the LLM ensures linguistic consistency. Experimental results show a 16% relative and 6% absolute reduction in word error rate (WER) over a 36% baseline, demonstrating substantial improvements in intelligibility for silent speech interfaces.
format Preprint
id arxiv_https___arxiv_org_abs_2509_04507
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle From Silent Signals to Natural Language: A Dual-Stage Transformer-LLM Approach
Sivasubramaniam, Nithyashree
Computation and Language
Artificial Intelligence
Silent Speech Interfaces (SSIs) have gained attention for their ability to generate intelligible speech from non-acoustic signals. While significant progress has been made in advancing speech generation pipelines, limited work has addressed the recognition and downstream processing of synthesized speech, which often suffers from phonetic ambiguity and noise. To overcome these challenges, we propose an enhanced automatic speech recognition framework that combines a transformer-based acoustic model with a large language model (LLM) for post-processing. The transformer captures full utterance context, while the LLM ensures linguistic consistency. Experimental results show a 16% relative and 6% absolute reduction in word error rate (WER) over a 36% baseline, demonstrating substantial improvements in intelligibility for silent speech interfaces.
title From Silent Signals to Natural Language: A Dual-Stage Transformer-LLM Approach
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2509.04507