Saved in:
Bibliographic Details
Main Authors: Luo, Yuxin, Zhang, Ruoyi, Liu, Lu-Chuan, Li, Tianyu, Liu, Hangyu
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.15140
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Pitch estimation (PE) in monophonic audio is crucial for MIDI transcription and singing voice conversion (SVC), but existing methods suffer significant performance degradation under noise. In this paper, we propose FCPE, a fast context-based pitch estimation model that employs a Lynx-Net architecture with depth-wise separable convolutions to effectively capture mel spectrogram features while maintaining low computational cost and robust noise tolerance. Experiments show that our method achieves 96.79\% Raw Pitch Accuracy (RPA) on the MIR-1K dataset, on par with the state-of-the-art methods. The Real-Time Factor (RTF) is 0.0062 on a single RTX 4090 GPU, which significantly outperforms existing algorithms in efficiency. Code is available at https://github.com/CNChTu/FCPE.