Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Azzouz, Sofiane, Vuissoz, Pierre-André, Laprie, Yves
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2510.00914
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917337005817856
author	Azzouz, Sofiane Vuissoz, Pierre-André Laprie, Yves
author_facet	Azzouz, Sofiane Vuissoz, Pierre-André Laprie, Yves
contents	Acoustic to articulatory inversion has often been limited to a small part of the vocal tract because the data are generally EMA (ElectroMagnetic Articulography) data requiring sensors to be glued to easily accessible articulators. The presented acoustic to articulation model focuses on the inversion of the entire vocal tract from the glottis, the complete tongue, the velum, to the lips. It relies on a realtime dynamic MRI database of more than 3 hours of speech. The data are the denoised speech signal and the automatically segmented articulator contours. Several bidirectional LSTM-based approaches have been used, either inverting each articulator individually or inverting all articulators simultaneously. To our knowledge, this is the first complete inversion of the vocal tract. The average RMSE precision on the test set is 1.65 mm to be compared with the pixel size which is 1.62 mm.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_00914
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Reconstruction of the Complete Vocal Tract Contour Through Acoustic to Articulatory Inversion Using Real-Time MRI Data Azzouz, Sofiane Vuissoz, Pierre-André Laprie, Yves Audio and Speech Processing Acoustic to articulatory inversion has often been limited to a small part of the vocal tract because the data are generally EMA (ElectroMagnetic Articulography) data requiring sensors to be glued to easily accessible articulators. The presented acoustic to articulation model focuses on the inversion of the entire vocal tract from the glottis, the complete tongue, the velum, to the lips. It relies on a realtime dynamic MRI database of more than 3 hours of speech. The data are the denoised speech signal and the automatically segmented articulator contours. Several bidirectional LSTM-based approaches have been used, either inverting each articulator individually or inverting all articulators simultaneously. To our knowledge, this is the first complete inversion of the vocal tract. The average RMSE precision on the test set is 1.65 mm to be compared with the pixel size which is 1.62 mm.
title	Reconstruction of the Complete Vocal Tract Contour Through Acoustic to Articulatory Inversion Using Real-Time MRI Data
topic	Audio and Speech Processing
url	https://arxiv.org/abs/2510.00914

Similar Items