This is the repo of all codes used in our paper [...].
Install pyenv and poetry. You can also setup your environment with other
tools, if the dependencies listed in pyproject.toml are installed.
The following is an example of setup for venv.
cd <repo>git clone <this-repo-url>pyenv install --patch 3.6.9 < python_alignment.patch; pyenv local 3.6.9or just use python 3.6.9; note that the install command is using this patchpython -m venv .venv; source .venv/bin/activate.fishpython -m pip install -U pippython -m pip install -r requirements.txtpip install git+https://github.com/CPJKU/madmom -c constraints.txt(this is needed because magenta has an exact dependency for an old mido version in its requirements.txt)python setup.py build_ext --inplace
Then, you'll need vienna_corpus, SMD and Maestro datasets from
asmd package:
python -m asmd.install
- Download our pretrained vienna model on Maestro and put it in your working dir from our mega
- Train our proposed model or download the pretrained ones from our mega:
- You will need the template matrix provided in this repo. To rebuild it
run
python -m perceptual.make_template. You will need the synthesized scale and the corresponding midi in thescalesandaudiofolder. You can download them from our megapython -m perceptual.proposed create_mini_specsto create the dataset of mini-specs or download it from our mega.- dataset size: 474.429 notes (831 batches in test, 178 in train))
python -m perceptual.proposed trainto train our model for velocity estimation and test it. I obtained the following absolute error (avg, std) on the test set: 15.11, 10.94 (251 epochs)- redo everything with vienna model (use
--viennaforcreate_mini_specsandtrain)
- Run
python -m perceptual.excerpt_search
This will analyze vienna_corpus in search of excerpts, will transcribe the
original performances and will create a new directory audio with all
extracted excerpts audio files and a directory to_be_synthesized with all
midi files that you have to synthesize and put in audio
- Synthesize the chosen excerpts with vsts or download our
synthesized midis from our mega; extract them in the
audiodirectory. You should have a directory for each vst inaudioand for each vst you should have 5 different audio. In the root ofaudioyou should also have the original recordings. - Install
soxin your path for post-processing to add reverb or run with--no-postprocess - Analyze chosen excerpts:
python -m perceptual.chose_vst
This will copy the excerpts relative to the chosen vsts to the folder
excerpts.
Chosen vsts: - q0: ./audio/salamander - q1: ['./audio/pianoteq1', './audio/salamander-norm_-20_reverb_50_norm'] - q2: ./audio/pianoteq1-norm_-20_reverb_100_norm
Set up your server (Python or PHP) and download WAET.
- place the directory
excerptsin the root of WAET - place the directory
reveal.js-3.9.2into the root of WAET - place the file
index.htmlin the root of WAET (if you want, you can regenerate theindex.htmlby runningpandoc --to revealjs -V revealjs-url=reveal.js-3.9.2 --output index.html --standalone index.md) - place the file
listening_test.xmlin[WAET root]/tests/pool.xml - place the file
core.cssin[WAET root]/css/core.css
You should be able to access your test at /test.html?url=php/pool.php.
More info in the WAET wiki
index.html contains the instructions for the test, so that you can
distribute the url to the root of WAET to your partecipants.
To plot tests you should use streamlit run perceptual_app.py,
which also prints correlations with the objective measure of your choice.
The test answers that we collected are available in the repo.
Before of running you should change the settings according to your system: open the script and change the initial global variables:
PATHis the path to thesavesdir of WAETDISCARD_BEFORE_THANdefines a date before of which the answers whould be discarded; this is useful for removing debug answersMAP_VALUESdefines the mapping for creating the control groups according to the answer of the users
Also note that all answers in which the users listened to for less than 5 seconds or didn't move the cursor are completely discarded. This is hard-coded in final section of the script.
At each run, violin plots are created for each control group and each method.
One plot is created for each question type and excerpt or for each question
type if average option is used. Under each plot, there are the p-values
computed for each combination of groups or methods. The error margins and
correlations are shown too.
Supplementary materials show some of the plots that can be generated.
To compute the linear regressions of the perceptual values, you should run
python perceptual.eval_regression. It will plot the regression
predictions for various model and weights for the case with and without MFCC
features. Than, it will also plots the weights with only the selected features.
If you want, you can test the selected features by using our_eval as option
to the subjective_eval script.
- Install fluidsynth and download SalamanderGrandPianoV3 soundfont in sf2 format from our mega folder and put it in your working dir
- run
python -m perceptual.alignment.dtw_tuningto check the FastDTW tuning in midi2midi overMusicNetsolo piano songs - run
python -m perceptual.alignment.align amtto perform our amt-based alignment over SMD dataset with the best parameters found in the previous step - run
python -m perceptual.alignment.align ewertto perform our baseline alignment over SMD dataset - run
python -m perceptual.alignment.analysis results/ewert.csv results/amt.csvto plot the results of alignment