Saved in:
Bibliographic Details
Main Authors: Gauy, Marcelo Matheus, Koza, Natalia Hitomi, Morita, Ricardo Mikio, Stanzione, Gabriel Rocha, Junior, Arnaldo Candido, Berti, Larissa Cristina, Levin, Anna Sara Shafferman, Sabino, Ester Cerdeira, Svartman, Flaviane Romani Fernandes, Finger, Marcelo
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.20989
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914897181278208
author Gauy, Marcelo Matheus
Koza, Natalia Hitomi
Morita, Ricardo Mikio
Stanzione, Gabriel Rocha
Junior, Arnaldo Candido
Berti, Larissa Cristina
Levin, Anna Sara Shafferman
Sabino, Ester Cerdeira
Svartman, Flaviane Romani Fernandes
Finger, Marcelo
author_facet Gauy, Marcelo Matheus
Koza, Natalia Hitomi
Morita, Ricardo Mikio
Stanzione, Gabriel Rocha
Junior, Arnaldo Candido
Berti, Larissa Cristina
Levin, Anna Sara Shafferman
Sabino, Ester Cerdeira
Svartman, Flaviane Romani Fernandes
Finger, Marcelo
contents We contrast high effectiveness of state of the art deep learning architectures designed for general audio classification tasks, refined for respiratory insufficiency (RI) detection and blood oxygen saturation (SpO$_2$) estimation and classification through automated audio analysis. Recently, multiple deep learning architectures have been proposed to detect RI in COVID patients through audio analysis, achieving accuracy above 95% and F1-score above 0.93. RI is a condition associated with low SpO$_2$ levels, commonly defined as the threshold SpO$_2$ <92%. While SpO$_2$ serves as a crucial determinant of RI, a medical doctor's diagnosis typically relies on multiple factors. These include respiratory frequency, heart rate, SpO$_2$ levels, among others. Here we study pretrained audio neural networks (CNN6, CNN10 and CNN14) and the Masked Autoencoder (Audio-MAE) for RI detection, where these models achieve near perfect accuracy, surpassing previous results. Yet, for the regression task of estimating SpO$_2$ levels, the models achieve root mean square error values exceeding the accepted clinical range of 3.5% for finger oximeters. Additionally, Pearson correlation coefficients fail to surpass 0.3. As deep learning models perform better in classification than regression, we transform SpO$_2$-regression into a SpO$_2$-threshold binary classification problem, with a threshold of 92%. However, this task still yields an F1-score below 0.65. Thus, audio analysis offers valuable insights into a patient's RI status, but does not provide accurate information about actual SpO$_2$ levels, indicating a separation of domains in which voice and speech biomarkers may and may not be useful in medical diagnostics under current technologies.
format Preprint
id arxiv_https___arxiv_org_abs_2407_20989
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation
Gauy, Marcelo Matheus
Koza, Natalia Hitomi
Morita, Ricardo Mikio
Stanzione, Gabriel Rocha
Junior, Arnaldo Candido
Berti, Larissa Cristina
Levin, Anna Sara Shafferman
Sabino, Ester Cerdeira
Svartman, Flaviane Romani Fernandes
Finger, Marcelo
Sound
Machine Learning
Audio and Speech Processing
We contrast high effectiveness of state of the art deep learning architectures designed for general audio classification tasks, refined for respiratory insufficiency (RI) detection and blood oxygen saturation (SpO$_2$) estimation and classification through automated audio analysis. Recently, multiple deep learning architectures have been proposed to detect RI in COVID patients through audio analysis, achieving accuracy above 95% and F1-score above 0.93. RI is a condition associated with low SpO$_2$ levels, commonly defined as the threshold SpO$_2$ <92%. While SpO$_2$ serves as a crucial determinant of RI, a medical doctor's diagnosis typically relies on multiple factors. These include respiratory frequency, heart rate, SpO$_2$ levels, among others. Here we study pretrained audio neural networks (CNN6, CNN10 and CNN14) and the Masked Autoencoder (Audio-MAE) for RI detection, where these models achieve near perfect accuracy, surpassing previous results. Yet, for the regression task of estimating SpO$_2$ levels, the models achieve root mean square error values exceeding the accepted clinical range of 3.5% for finger oximeters. Additionally, Pearson correlation coefficients fail to surpass 0.3. As deep learning models perform better in classification than regression, we transform SpO$_2$-regression into a SpO$_2$-threshold binary classification problem, with a threshold of 92%. However, this task still yields an F1-score below 0.65. Thus, audio analysis offers valuable insights into a patient's RI status, but does not provide accurate information about actual SpO$_2$ levels, indicating a separation of domains in which voice and speech biomarkers may and may not be useful in medical diagnostics under current technologies.
title Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation
topic Sound
Machine Learning
Audio and Speech Processing
url https://arxiv.org/abs/2407.20989