This repository performs the API calls necessary to download Uniprot data to s3.
The first step will download the exclusion_branches.tsv and ncbitaxon_removed_subset.json to the data/raw directory. The ncbitaxon_removed_subset.json file is used to query only the set of microbes from the kg-microbe repository in UniProt.
To run, execute the make all command.
Switch to the human_query branch, and execute the make uniprot-download command.
Switch to the build_custom_microbial_sets branch. Upload a txt file containing all NCBITaxon IDs in the desired subset to the data/raw directory (an example called wallen_etal_microbes.txt is saved). If the name is changed, update the ORGANISM_RESOURCE variable in main.py to the correct filename.
To run, execute the make uniprot-download command.
This cookiecutter project was developed from the monarch-project-template template and will be kept up-to-date using cruft.