Sample size for error estimation is to small!

Hello,

first of all: Thanks for this great tool, that really provides a comprehensive sets of metrics and with neat visualisations as well!

For the evaluation of the error rate AlignQC samples the best alignments from the first n reads such that n % 100 = 1 and the total alignment length reaches at least 1,000,000 bases. For datasets with an average read length over 2000 bases, this results in a sample of only about 501 reads. As a consequence, the reported error rates can be significantly influenced by the number of threads used during the mapping step. This is because higher-quality reads often map more quickly and therefore tend to appear earlier in the BAM file, leading to their overrepresentation in the sample AlignQC analyzes.

I tested this behavior using a small dataset of corrected long reads aligned with Minimap2, once using 1 thread and once using 128 threads. The resulting AlignQC-reported error rates were:
- 0.387% with 128 thread
- 0.447% with 1 threads

Best, simon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample size for error estimation is to small! #28

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Sample size for error estimation is to small! #28

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions