Skip to content

suggested error-rate setting for PacBio hifi reads #19

@tpshea2

Description

@tpshea2

Hello-

I see in the Supplemental Methods that an error-rate of 0.15 was used for all reads (nanopore and PacBio).

Can you suggest a good starting point for error-rate for recently generated (highly accurate) PacBio hifi reads?

I have data that I know are highly accurate so I initially set error-rate to .01. However in my 14 samples I got very high unclassified/no rank (from 79-94% unclassified across the 14 samples). I used the refseq-abfv-k22-s12.hixf index. When increasing error-rate to 0.05 and again to 0.15 there are still fairly high rates of unclassified reads:

sample | --error-rate .01 | --error-rate .05 | --error-rate .15

1 | 86.1 | 76.2 | 66.3
2 | 79.4 | 65.1 | 49.2
3 | 92.6 | 84.0 | 71.5
4 | 92.7 | 83.2 | 69.5
5 | 87.4 | 75.5 | 63.2
6 | 93.4 | 82.2 | 67.3
7 | 91.1 | 80.7 | 67.4
8 | 93.4 | 76.1 | 57.5
9 | 91.9 | 84.1 | 74.1
10 | 88.0 | 77.8 | 64.7
11 | 81.1 | 63.6 | 46.1
12 | 84.9 | 74.0 | 61.4
13 | 94.4 | 87.0 | 75.7
14 | 89.7 | 80.0 | 68.7

I will download the GTDB Release 220 index and try that but I thought I would seek out suggestions to increase % classified.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions