Saved in:
Bibliographic Details
Main Authors: Marttila, David, Reiss, Joshua D.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.11233
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918092864487424
author Marttila, David
Reiss, Joshua D.
author_facet Marttila, David
Reiss, Joshua D.
contents Neural networks have become the dominant technique for accurate pitch and periodicity estimation. Although a lot of research has gone into improving network architectures and training paradigms, most approaches operate directly on the raw audio waveform or on general-purpose time-frequency representations. We investigate the use of Sawtooth-Inspired Pitch Estimation (SWIPE) kernels as an audio frontend and find that these hand-crafted, task-specific features can make neural pitch estimators more accurate, robust to noise, and more parameter-efficient. We evaluate supervised and self-supervised state-of-the-art architectures on common datasets and show that the SWIPE audio frontend allows for reducing the network size by an order of magnitude without performance degradation. Additionally, we show that the SWIPE algorithm on its own is much more accurate than commonly reported, outperforming state-of-the-art self-supervised neural pitch estimators.
format Preprint
id arxiv_https___arxiv_org_abs_2507_11233
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Improving Neural Pitch Estimation with SWIPE Kernels
Marttila, David
Reiss, Joshua D.
Sound
Audio and Speech Processing
Neural networks have become the dominant technique for accurate pitch and periodicity estimation. Although a lot of research has gone into improving network architectures and training paradigms, most approaches operate directly on the raw audio waveform or on general-purpose time-frequency representations. We investigate the use of Sawtooth-Inspired Pitch Estimation (SWIPE) kernels as an audio frontend and find that these hand-crafted, task-specific features can make neural pitch estimators more accurate, robust to noise, and more parameter-efficient. We evaluate supervised and self-supervised state-of-the-art architectures on common datasets and show that the SWIPE audio frontend allows for reducing the network size by an order of magnitude without performance degradation. Additionally, we show that the SWIPE algorithm on its own is much more accurate than commonly reported, outperforming state-of-the-art self-supervised neural pitch estimators.
title Improving Neural Pitch Estimation with SWIPE Kernels
topic Sound
Audio and Speech Processing
url https://arxiv.org/abs/2507.11233