Taxonomy-comparison

Repo for comparing reads taxonomy annotations between MG-RAST and MGnify Comparison of taxonomies MG-RAST and MGnify

RUNNING ==============================

MGnify
Run:

/hps/nobackup2/production/metagenomics/pipeline/tools-v4/miniconda2-4.0.5/bin/python  /hps/nobackup2/production/metagenomics/production-scripts/current/mgportal/analysis-pipeline/python/pipelineInitialisation.py -s /hps/nobackup2/production/metagenomics/production-scripts/current/mgportal/analysis-pipeline/python -f recalc -l 100 -y /hps/nobackup2/production/metagenomics/pipeline/tools/miniconda2-4.0.5/bin/python -o <outdir> -p <fasta> OR <1.fq,2.fq>

Necessary files:

taxonomy-results/SSU/SRR6367227_MERGED_FASTQ_SSU.fasta.mseq

MG-RAST
For FASTA - run amplicon-fasta.workflow.cwl
For FASTQ - run amplicon-fastq.workflow.cwl !!! Run with interleaved fastq file !!!
Run:

Install from GitHub
Create venv with mini-conda
bsub -M 5000 -Is $SHELL
activate vent
add path of fasta/fastq file to yaml file
run with singularity:

cwltool --cachedir .cache --singularity --no-match-user amplicon-fastq.workflow.cwl amplicon-fastq.job.yaml
or 
cwltool --cachedir .cache --singularity --no-match-user amplicon-fasta.workflow.cwl amplicon-fasta.job.yaml

Necessary files:

amp_fq_test.440.cluster.rna97.mapping - list of clusters with all members
amp_fq_test.440.cluster.rna97.fna - annotation of main members of cluster

PARSING ==============================

Run python_comparison.py

PLOT ==============================

Copy all lines after http://sankeymatic.com/build/ to web-visualiser

==============================

Main ideas of comparison

The comparison is doing by each level and between reads that were annotated by both pipelines. The first step is to calculate the number of reads which were annotated further than super kingdom for MGnify and MG-RAST, calculate the number of reads that have their annotation stoped on sk level for MGnify and MG-RAST. Further look only on reads that were annotated deeper than sk. Repeat for kingdom, phylum and so on.

MG-RAST pipeline:
Add to pipeline CWL file lines to output necessary files! MG-RAST makes annotations for main members of cluster. All members in cluster have the same taxonomy as main member.
Kingdom
Sometimes could be absent in taxonomic trees. It seems that MG-RAST skips this level and make annotation {super kingdom, phylum, class,…}. MGnify annotates “k__”. Solution:

add to all MG-RAST annotations prefixes “sk__, k__, and so on”
add empty kingdom after super kingdom to MG-RAST annotation

MG-RAST (class)
Sometimes MG-RAST duplicate class to further field, that do not have annotation. It is necessary to check names of fields excluding “(class)” from name. For example, NAME and NAME (CLASS) - this annotations are the same, but if we want to compare strings, they will be different.
MG-RAST Unclassified
MG-RAST likes to annotate some fields as “unclassified came from …”. This annotations mean nothing - skip them. Be attentive with cases : sk__…;k__unclassified;p__unclassified;c_CLASSIFIED. This means that some fields absent in taxonomic tree. These cases must continue their comparison.
MG-RAST uncultured
Skip all annotations with uncltured.
Levenshtein distance
Some field in MGnify and MG-RAST taxonomies could be different on one/two letters: c__Fusobacteriia", "c__Fusobacteria It makes sense to calculate Levenshtein distance between lines. Lets say that annotations are different if LD > 3.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples		examples
README.md		README.md
parsing_taxonomy.py		parsing_taxonomy.py
reads_comparison.py		reads_comparison.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Taxonomy-comparison

RUNNING ==============================

PARSING ==============================

PLOT ==============================

Main ideas of comparison

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Taxonomy-comparison

RUNNING ==============================

PARSING ==============================

PLOT ==============================

Main ideas of comparison

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages