Author: Mortadha Manai
Report Link :https://github.com/MortadhaMannai/VOCAL-TRACK-EXTRACTION-USING-NEURAL-NETWORKS/blob/main/Report.pdf
Paper links :
1- zendo.org : https://zenodo.org/record/8274725
2- OpenAir.com : https://explore.openaire.eu/search/publication?pid=10.5281%2Fzenodo.8267702&fbclid=IwAR13OfUARkpyVk1jzk2fFoqaxVeNz2xbDwNySsu8vCV0FxwslG0eI8hqx90
There are four models in this project: Deep Clustering Model, Hybrid Deep Clustering Model, U-net Model and UH-net Model. Models are trained on DSD100 dataset. The project is based on PyTorch.
-
Data preprocess:
Build_Dataset.ipynb: generate dataset from DSD100config.py: define project-level parametersdata_loader.py: define torch loadermel_dealer.py: convert music file to melspectrogram and convert spectrogram back
-
Model defination:
unet_model.py: define U-net Model and UH-net Modelcluster_model.py: define Deep Clustering Modelhybrid_model.py: define Hybrid Deep Clustering Model
-
Model training:
utils.py: define loss functionsunet_train.py: train functions for u-net / uh-net modelhd_train.py: train functions for hybrid deep clustering modeldc_train.py: train functions for deep clustering modeltrain_dc.ipynb,train_hybrid.ipynbandtrain_unet.ipynb: train models
-
Model evaluation:
evaluation.py: define evaluation functionsmusic_decoder.py: retrieve audio file from model outputs
Original Music ( Vocal Track)
==> Hybrid Deep Clustering Model
==> U-net Model
==> UH-net Model
- Masked Power Spectrograms:
- Generated Masks:

