Saved in:
Bibliographic Details
Main Authors: Azzouz, Sofiane, Vuissoz, Pierre-André, Laprie, Yves
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.00914
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917337005817856
author Azzouz, Sofiane
Vuissoz, Pierre-André
Laprie, Yves
author_facet Azzouz, Sofiane
Vuissoz, Pierre-André
Laprie, Yves
contents Acoustic to articulatory inversion has often been limited to a small part of the vocal tract because the data are generally EMA (ElectroMagnetic Articulography) data requiring sensors to be glued to easily accessible articulators. The presented acoustic to articulation model focuses on the inversion of the entire vocal tract from the glottis, the complete tongue, the velum, to the lips. It relies on a realtime dynamic MRI database of more than 3 hours of speech. The data are the denoised speech signal and the automatically segmented articulator contours. Several bidirectional LSTM-based approaches have been used, either inverting each articulator individually or inverting all articulators simultaneously. To our knowledge, this is the first complete inversion of the vocal tract. The average RMSE precision on the test set is 1.65 mm to be compared with the pixel size which is 1.62 mm.
format Preprint
id arxiv_https___arxiv_org_abs_2510_00914
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Reconstruction of the Complete Vocal Tract Contour Through Acoustic to Articulatory Inversion Using Real-Time MRI Data
Azzouz, Sofiane
Vuissoz, Pierre-André
Laprie, Yves
Audio and Speech Processing
Acoustic to articulatory inversion has often been limited to a small part of the vocal tract because the data are generally EMA (ElectroMagnetic Articulography) data requiring sensors to be glued to easily accessible articulators. The presented acoustic to articulation model focuses on the inversion of the entire vocal tract from the glottis, the complete tongue, the velum, to the lips. It relies on a realtime dynamic MRI database of more than 3 hours of speech. The data are the denoised speech signal and the automatically segmented articulator contours. Several bidirectional LSTM-based approaches have been used, either inverting each articulator individually or inverting all articulators simultaneously. To our knowledge, this is the first complete inversion of the vocal tract. The average RMSE precision on the test set is 1.65 mm to be compared with the pixel size which is 1.62 mm.
title Reconstruction of the Complete Vocal Tract Contour Through Acoustic to Articulatory Inversion Using Real-Time MRI Data
topic Audio and Speech Processing
url https://arxiv.org/abs/2510.00914