KEGG GhostKOALA workflow

This folder contains the files necessary to rerun the analysis of KEGG GhostKOALA annotations. This analysis appears in Figure 4b ( http://dx.doi.org/10.1101/462788).

To run the workflow, clone this repository, change directories into the folder, make sure you have snakemake installed, and run snakemake.

If you have conda installed, you could run the following:

conda create -n hu python=3.6
source activate hu
conda install snakemake
cd pipeline-analyses/kegg_snakemake
snakemake --use-conda

The GhostKOALA output files are downloaded as the first step in the workflow. To recreate these yourself, do the following:

Generate the PLASS-assembled amino acid sequences for each neighborhood using the pipeline-base Snakefile.
Concatenate these sequences together.
Upload concatenated sequences to KEGG GhostKOALA.
Select "genus_prokaryotes + family_eukaryotes + viruses" database, and run KEGG
Download the annotation data and the taxonomy data
For the annotation data, click "Preview first 100", and then select "Download detail."
Save the file as "outputs/GhostKOALA/nbhd_user_ko_definition.txt"
For the taxonomy data, click download. Save the file as "outputs/GhostKOALA/nbhd.user.out.top.gz"
Then, unzip the file gunzip outputs/GhostKOALA/user.out.top.gz
Generate the prokka amino acid sequences for each bin using the pipeline-analyses/bin_prokka_snakemake Snakefile.
Follow steps 3-9, this time replacing nbhd in the filename with bin.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!