Guardado en:
| Autores principales: | , , , , |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2410.05455 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| _version_ | 1866913535926206464 |
|---|---|
| author | Gupta, Shubham Gomez-Sarmiento, Isaac Neri Mezdari, Faez Amjed Ravanelli, Mirco Subakan, Cem |
| author_facet | Gupta, Shubham Gomez-Sarmiento, Isaac Neri Mezdari, Faez Amjed Ravanelli, Mirco Subakan, Cem |
| contents | We propose a novel approach for humming transcription that combines a CNN-based architecture with a dynamic programming-based post-processing algorithm, utilizing the recently introduced HumTrans dataset. We identify and address inherent problems with the offset and onset ground truth provided by the dataset, offering heuristics to improve these annotations, resulting in a dataset with precise annotations that will aid future research. Additionally, we compare the transcription accuracy of our method against several others, demonstrating state-of-the-art (SOTA) results. All our code and corrected dataset is available at https://github.com/shubham-gupta-30/humming_transcription |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2410_05455 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming Gupta, Shubham Gomez-Sarmiento, Isaac Neri Mezdari, Faez Amjed Ravanelli, Mirco Subakan, Cem Machine Learning Artificial Intelligence Sound Audio and Speech Processing We propose a novel approach for humming transcription that combines a CNN-based architecture with a dynamic programming-based post-processing algorithm, utilizing the recently introduced HumTrans dataset. We identify and address inherent problems with the offset and onset ground truth provided by the dataset, offering heuristics to improve these annotations, resulting in a dataset with precise annotations that will aid future research. Additionally, we compare the transcription accuracy of our method against several others, demonstrating state-of-the-art (SOTA) results. All our code and corrected dataset is available at https://github.com/shubham-gupta-30/humming_transcription |
| title | Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming |
| topic | Machine Learning Artificial Intelligence Sound Audio and Speech Processing |
| url | https://arxiv.org/abs/2410.05455 |