Guardado en:
Detalles Bibliográficos
Autores principales: Gupta, Shubham, Gomez-Sarmiento, Isaac Neri, Mezdari, Faez Amjed, Ravanelli, Mirco, Subakan, Cem
Formato: Preprint
Publicado: 2024
Materias:
Acceso en línea:https://arxiv.org/abs/2410.05455
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866913535926206464
author Gupta, Shubham
Gomez-Sarmiento, Isaac Neri
Mezdari, Faez Amjed
Ravanelli, Mirco
Subakan, Cem
author_facet Gupta, Shubham
Gomez-Sarmiento, Isaac Neri
Mezdari, Faez Amjed
Ravanelli, Mirco
Subakan, Cem
contents We propose a novel approach for humming transcription that combines a CNN-based architecture with a dynamic programming-based post-processing algorithm, utilizing the recently introduced HumTrans dataset. We identify and address inherent problems with the offset and onset ground truth provided by the dataset, offering heuristics to improve these annotations, resulting in a dataset with precise annotations that will aid future research. Additionally, we compare the transcription accuracy of our method against several others, demonstrating state-of-the-art (SOTA) results. All our code and corrected dataset is available at https://github.com/shubham-gupta-30/humming_transcription
format Preprint
id arxiv_https___arxiv_org_abs_2410_05455
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming
Gupta, Shubham
Gomez-Sarmiento, Isaac Neri
Mezdari, Faez Amjed
Ravanelli, Mirco
Subakan, Cem
Machine Learning
Artificial Intelligence
Sound
Audio and Speech Processing
We propose a novel approach for humming transcription that combines a CNN-based architecture with a dynamic programming-based post-processing algorithm, utilizing the recently introduced HumTrans dataset. We identify and address inherent problems with the offset and onset ground truth provided by the dataset, offering heuristics to improve these annotations, resulting in a dataset with precise annotations that will aid future research. Additionally, we compare the transcription accuracy of our method against several others, demonstrating state-of-the-art (SOTA) results. All our code and corrected dataset is available at https://github.com/shubham-gupta-30/humming_transcription
title Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming
topic Machine Learning
Artificial Intelligence
Sound
Audio and Speech Processing
url https://arxiv.org/abs/2410.05455