PiMOD

PiMOD (Pitch Estimation by Multiple Octave Decoders) is a neural-network-based pitch estimation method.

Yael Segal([email protected])
Joseph Keshet ([email protected])

PiMOD is a software package for automatic estimation of pitch. We propose a neural-based architecture composed of CNN as fearues extractor and a linear layers as multiple octave decoders.

The model was present in the paper Pitch Estimation by Multiple Octave Decoders. If you find our work useful please cite :

@article{segal2021pitch,
  title={Pitch Estimation by Multiple Octave Decoders},
  author={Segal, Yael and Arama-Chayoth, May and Keshet, Joseph},
  journal={IEEE Signal Processing Letters},
  volume={28},
  pages={1610--1614},
  year={2021},
  publisher={IEEE}
}

Installation instructions:

Python 3.9+

Download the code:

git clone https://github.com/MLSpeech/PiMOD

python3 -m pip install --user virtualenv
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

Training:

Place your .wav files and labels file (.txt) in the same directory. The code assumes a main directory and folds, where you can use each fold for train/val/test. The data structure is as follows:

SOME_PATH/DATA_PATH
        └───1
        │   │   1.wav
        │   │   1.txt
        │   │   2.wav
        │   │   2.txt
        └───2
        │   │   3.wav
        │   │   3.txt
        │   │   4.wav
        │   │   4.txt

Example of training command:

python run.py --data SOME_PATH/DATA_PATH --train_folds 1_2_3 --val_folds 4 --test_folds 5 --experiment_name my_exp --cuda --neptune

Some highlights:

The code supporting logging to neptune using the flag --neptune.
To run on GPU, use the flag --cuda
Look at the run.py file to see all the script hyper-parameters

Inference:

python predict.py --data SOME_PATH/DATA_PATH/FILE.wav --cuda --model ./model/6_decoders_MDB_KEELE.pth --batch_size 32 --outpath OUTPUT_DIRECTORY

Some highlights:

The prediction script can run on a directory (with multiple files) or a single file.
To run on GPU, use the flag --cuda
The default model is ./model/6_decoders_MDB_KEELE.pth, which was train on MDB-STEM-Synth[1] and Keele[2] datasets.

Example of prediction file:

time,frequency,confidence
.000, 0.000, 1.000
0.010, 0.000, 1.000
0.020, 0.000, 1.000
0.030, 0.000, 1.000
0.040, 0.000, 1.000
0.050, 0.000, 1.000
0.060, 0.000, 1.000
0.070, 0.000, 1.000
0.080, 0.000, 1.000
0.090, 0.000, 1.000
0.100, 0.000, 1.000
0.110, 231.109, 0.730
0.120, 217.145, 1.000
0.130, 212.794, 1.000
0.140, 209.846, 1.000
0.150, 210.985, 1.000
0.160, 210.410, 1.000
0.170, 213.200, 1.000
0.180, 215.494, 1.000
0.190, 217.457, 1.000

Please Note

The current version only supports WAV files as input. The model is trained on 16 kHz audio, so if the input audio has a different sample rate, it will be first resampled to 16 kHz using librosa (for training- this will make the training slower).
Prediction/ Training is significantly faster if you run on GPU.
The prediction script loads the data with only 1 worker. If your computer contains more than 1 core - you can change 'num_workers' to #CORES. This change will allow the prediction to run faster.

References

[1] J. W. Kim, J. Salamon, P. Li, and J. P. Bello, “CREPE: A convolutional representation for pitch estimation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2018, pp. 161–165.

[2] F. Plante, G. F. Meyer, and W. A. Ainsworth, “A pitch extraction refer- ence database,” in Proc. 4th Eur. Conf. Speech Commun. Technol., 1995, pp. 837–840.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
model		model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
loss.py		loss.py
model.py		model.py
predict.py		predict.py
requirements.txt		requirements.txt
run.py		run.py
train_util.py		train_util.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PiMOD

Installation instructions:

Training:

Inference:

Please Note

References

About

Releases

Packages

Languages

License

MLSpeech/PiMOD

Folders and files

Latest commit

History

Repository files navigation

PiMOD

Installation instructions:

Training:

Inference:

Please Note

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages