-
Notifications
You must be signed in to change notification settings - Fork 176
Replaced HPS pitch estimation with SWIPE (ultra-slim) #330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you very much @a5hun for your contribution, it's very much appreciated. I have a couple questions:
|
|
Thanks for screenshots for comparison, it seems to be indeed more precise. Thanks also for the move from scipy to numpy. Regarding hiding zeroes or Nan in plots, I think this is something that could be addressed. I would also allow to cleanup other plots like the long-time level widget. |
|
@a5hun Would you be able to rebase your changes please? There are merge conflicts following changes I made in pitch_tracker.py... Sorry about that! |
|
(I'm also curious if it could be realistic to use CREPE or SwiftF0) |
Pitch tracker now works with the latest friture changes. Kernel generation is slightly different, tuned with vocal stems (male/female) to return fewer spurious voiced pitches.
|
Not sure this was what you asked, exactly. A git rebase seems to be more difficult than writing a SWIPE-like pitch tracker! Everything on my branch should now be current with friture's master. CREPE is too slow for realtime. I've spent a lot of time testing out various pitch tracking algorithms with high quality isolated vocals (https://cambridge-mt.com/ms3/mtk/), and I settled on a customized SWIPE-ish one because it's fast, very precise, and can work on independent audio frames. I really like pYIN, but it's also too slow. AI models can produce good results (CREPE is good, SPICE not so much, and I'm playing with SwiftF0 now), but they're either too slow or don't work well on 4096 samples. https://github.com/lars76/pitch-benchmark/ For this clip, calc time for my SWIPE algo is 0.122 seconds. SwiftF0 takes 1.52 seconds. That's at 4096 FFT length, hop at 1/4 that. I tried to implement SwiftF0 into Friture (it's very probably the better pitch estimator), but I ran into a lot of issues. It needs librosa and scipy (and more!), and I can't get onnxruntime to work without crashing. If you can untangle the dependencies, it's really easy to implement in code, but I don't know enough about which versions of what that friture requires. |
|
Thank you very much @a5hun, that looks great! I will merge as is. I am also curious if there are some algorithms that are capable of handling multi-sources, because I can imagine that it could be interesting on a setting with several voices and instruments. |


Replaced the Harmonic Product Spectrum (HPS) pitch estimator with an ultra-slim implementation of SWIPE (Sawtooth Waveform Inspired Pitch Estimator), adapted from libf0 and based on Arturo Camacho's original algorithm:
https://ufdcimages.uflib.ufl.edu/UF/E0/02/15/89/00001/camacho_a.pdf
SWIPE is an extremely accurate and performant pitch estimator, working on log-spaced frequency bins for precision, and it works well for most voices. Arturo's original implementation and both libf0 versions (full and slim) use multiple FFT sizes [128, 256, 512, 1024, 2048, and 4096] to reduce the pitch strength loss for different frequencies when using suboptimal window sizes, but pitch estimation is still very good with just a single FFT size (4096 for sr > 44100). I've also slightly modified the kernels to weigh the fundamental and first harmonic evenly, which helps prevent cases where a higher strength first harmonic (as in a sung or voiced "eh" ⟨e⟩) is falsely detected as the pitch.
To better align with other voice pitch trackers, a new frequency scale has been added, centered around C and with ticks for each note on the chromatic scale.