-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Dear Jens-Uwe,
I am trying to reproduce the ONT reads classification you present in your paper.
I downloaded the Zymo reads dataset (https://nanopore.s3.climb.ac.uk/mock/Zymo-GridION-EVEN-3Peaks-R103-merged.fq.gz) and the RefSeq index you provide.
I filtered the reads dataset to only keep the 426213 reads present in the supplementary file (ZymoR103-groundTruth.binning).
I ran the commands described in the supplemental methods (with version 0.1.3):
taxor search --index-file refseq-abfv-k22-s12.hixf --query-file ZymoR103-groundTruth.reads.fq --output-file zymo_refseq_mapped.search.txt --error-rate 0.15
taxor profile --search-file zymo_refseq_mapped.search.txt --cami-report-file zymo_refseq_mapped.report --seq-abundance-file zymo_refseq_mapped.abundance --binning-file zymo_refseq_mapped.binning --sample-id zymo_mapped
And this is the top of the abundance file produced:
@SAMPLEID:zymo_mapped
@Version:0.10.0
@Ranks:superkingdom|phylum|class|order|family|genus|species
@@TaXiD RANK TAXPATH TAXPATHSN PERCENTAGE
unclassified no rank - - 34.8245
2 superkingdom 2 Bacteria 62.8096
2759 superkingdom 2759 Eukaryota 1.18561
1224 phylum 2|1224 Bacteria|Pseudomonadota 38.0304
1239 phylum 2|1239 Bacteria|Bacillota 24.7792
4890 phylum 2759|4890 Eukaryota|Ascomycota 0.800521
Compared to the file ZymoR103-groundTruth.abundance:
@SAMPLEID:ZymoR10.3
@Version:0.10.0
@Ranks:superkingdom|phylum|class|order|family|genus|species
@@TaXiD RANK TAXPATH TAXPATHSN PERCENTAGE
unclassified no rank - - 8.22434
2 superkingdom 2 Bacteria 88.6767
2759 superkingdom 2759 Eukaryota 2.25609
1224 phylum 2|1224 Bacteria|Pseudomonadota 38.9176
1239 phylum 2|1239 Bacteria|Bacillota 49.7591
4890 phylum 2759|4890 Eukaryota|Ascomycota 1.51353
How to explain the much higher rate of unclassified reads in my attempt to repeat your analysis ?
Best regards !