Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Gauy, Marcelo Matheus, Koza, Natalia Hitomi, Morita, Ricardo Mikio, Stanzione, Gabriel Rocha, Junior, Arnaldo Candido, Berti, Larissa Cristina, Levin, Anna Sara Shafferman, Sabino, Ester Cerdeira, Svartman, Flaviane Romani Fernandes, Finger, Marcelo
Format:	Preprint
Published:	2024
Subjects:	Sound Machine Learning Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2407.20989
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914897181278208
author	Gauy, Marcelo Matheus Koza, Natalia Hitomi Morita, Ricardo Mikio Stanzione, Gabriel Rocha Junior, Arnaldo Candido Berti, Larissa Cristina Levin, Anna Sara Shafferman Sabino, Ester Cerdeira Svartman, Flaviane Romani Fernandes Finger, Marcelo
author_facet	Gauy, Marcelo Matheus Koza, Natalia Hitomi Morita, Ricardo Mikio Stanzione, Gabriel Rocha Junior, Arnaldo Candido Berti, Larissa Cristina Levin, Anna Sara Shafferman Sabino, Ester Cerdeira Svartman, Flaviane Romani Fernandes Finger, Marcelo
contents	We contrast high effectiveness of state of the art deep learning architectures designed for general audio classification tasks, refined for respiratory insufficiency (RI) detection and blood oxygen saturation (SpO$_2$) estimation and classification through automated audio analysis. Recently, multiple deep learning architectures have been proposed to detect RI in COVID patients through audio analysis, achieving accuracy above 95% and F1-score above 0.93. RI is a condition associated with low SpO$_2$ levels, commonly defined as the threshold SpO$_2$ <92%. While SpO$_2$ serves as a crucial determinant of RI, a medical doctor's diagnosis typically relies on multiple factors. These include respiratory frequency, heart rate, SpO$_2$ levels, among others. Here we study pretrained audio neural networks (CNN6, CNN10 and CNN14) and the Masked Autoencoder (Audio-MAE) for RI detection, where these models achieve near perfect accuracy, surpassing previous results. Yet, for the regression task of estimating SpO$_2$ levels, the models achieve root mean square error values exceeding the accepted clinical range of 3.5% for finger oximeters. Additionally, Pearson correlation coefficients fail to surpass 0.3. As deep learning models perform better in classification than regression, we transform SpO$_2$-regression into a SpO$_2$-threshold binary classification problem, with a threshold of 92%. However, this task still yields an F1-score below 0.65. Thus, audio analysis offers valuable insights into a patient's RI status, but does not provide accurate information about actual SpO$_2$ levels, indicating a separation of domains in which voice and speech biomarkers may and may not be useful in medical diagnostics under current technologies.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_20989
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation Gauy, Marcelo Matheus Koza, Natalia Hitomi Morita, Ricardo Mikio Stanzione, Gabriel Rocha Junior, Arnaldo Candido Berti, Larissa Cristina Levin, Anna Sara Shafferman Sabino, Ester Cerdeira Svartman, Flaviane Romani Fernandes Finger, Marcelo Sound Machine Learning Audio and Speech Processing We contrast high effectiveness of state of the art deep learning architectures designed for general audio classification tasks, refined for respiratory insufficiency (RI) detection and blood oxygen saturation (SpO$_2$) estimation and classification through automated audio analysis. Recently, multiple deep learning architectures have been proposed to detect RI in COVID patients through audio analysis, achieving accuracy above 95% and F1-score above 0.93. RI is a condition associated with low SpO$_2$ levels, commonly defined as the threshold SpO$_2$ <92%. While SpO$_2$ serves as a crucial determinant of RI, a medical doctor's diagnosis typically relies on multiple factors. These include respiratory frequency, heart rate, SpO$_2$ levels, among others. Here we study pretrained audio neural networks (CNN6, CNN10 and CNN14) and the Masked Autoencoder (Audio-MAE) for RI detection, where these models achieve near perfect accuracy, surpassing previous results. Yet, for the regression task of estimating SpO$_2$ levels, the models achieve root mean square error values exceeding the accepted clinical range of 3.5% for finger oximeters. Additionally, Pearson correlation coefficients fail to surpass 0.3. As deep learning models perform better in classification than regression, we transform SpO$_2$-regression into a SpO$_2$-threshold binary classification problem, with a threshold of 92%. However, this task still yields an F1-score below 0.65. Thus, audio analysis offers valuable insights into a patient's RI status, but does not provide accurate information about actual SpO$_2$ levels, indicating a separation of domains in which voice and speech biomarkers may and may not be useful in medical diagnostics under current technologies.
title	Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation
topic	Sound Machine Learning Audio and Speech Processing
url	https://arxiv.org/abs/2407.20989

Similar Items