Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autor principal:	Ganchev, Radan
Formato:	Preprint
Publicado:	2024
Materias:	Sound Machine Learning Audio and Speech Processing
Acceso en línea:	https://arxiv.org/abs/2403.20202
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866913290536353792
author	Ganchev, Radan
author_facet	Ganchev, Radan
contents	The widespread use of automated voice assistants along with other recent technological developments have increased the demand for applications that process audio signals and human voice in particular. Voice recognition tasks are typically performed using artificial intelligence and machine learning models. Even though end-to-end models exist, properly pre-processing the signal can greatly reduce the complexity of the task and allow it to be solved with a simpler ML model and fewer computational resources. However, ML engineers who work on such tasks might not have a background in signal processing which is an entirely different area of expertise. The objective of this work is to provide a concise comparative analysis of Fourier and Wavelet transforms that are most commonly used as signal decomposition methods for audio processing tasks. Metrics for evaluating speech intelligibility are also discussed, namely Scale-Invariant Signal-to-Distortion Ratio (SI-SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligibility (STOI). The level of detail in the exposition is meant to be sufficient for an ML engineer to make informed decisions when choosing, fine-tuning, and evaluating a decomposition method for a specific ML model. The exposition contains mathematical definitions of the relevant concepts accompanied with intuitive non-mathematical explanations in order to make the text more accessible to engineers without deep expertise in signal processing. Formal mathematical definitions and proofs of theorems are intentionally omitted in order to keep the text concise.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_20202
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Voice Signal Processing for Machine Learning. The Case of Speaker Isolation Ganchev, Radan Sound Machine Learning Audio and Speech Processing The widespread use of automated voice assistants along with other recent technological developments have increased the demand for applications that process audio signals and human voice in particular. Voice recognition tasks are typically performed using artificial intelligence and machine learning models. Even though end-to-end models exist, properly pre-processing the signal can greatly reduce the complexity of the task and allow it to be solved with a simpler ML model and fewer computational resources. However, ML engineers who work on such tasks might not have a background in signal processing which is an entirely different area of expertise. The objective of this work is to provide a concise comparative analysis of Fourier and Wavelet transforms that are most commonly used as signal decomposition methods for audio processing tasks. Metrics for evaluating speech intelligibility are also discussed, namely Scale-Invariant Signal-to-Distortion Ratio (SI-SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligibility (STOI). The level of detail in the exposition is meant to be sufficient for an ML engineer to make informed decisions when choosing, fine-tuning, and evaluating a decomposition method for a specific ML model. The exposition contains mathematical definitions of the relevant concepts accompanied with intuitive non-mathematical explanations in order to make the text more accessible to engineers without deep expertise in signal processing. Formal mathematical definitions and proofs of theorems are intentionally omitted in order to keep the text concise.
title	Voice Signal Processing for Machine Learning. The Case of Speaker Isolation
topic	Sound Machine Learning Audio and Speech Processing
url	https://arxiv.org/abs/2403.20202

Ejemplares similares