Hello,
first of all: Thanks for this great tool, that really provides a comprehensive sets of metrics and with neat visualisations as well!
For the evaluation of the error rate AlignQC samples the best alignments from the first n reads such that n % 100 = 1 and the total alignment length reaches at least 1,000,000 bases. For datasets with an average read length over 2000 bases, this results in a sample of only about 501 reads. As a consequence, the reported error rates can be significantly influenced by the number of threads used during the mapping step. This is because higher-quality reads often map more quickly and therefore tend to appear earlier in the BAM file, leading to their overrepresentation in the sample AlignQC analyzes.
I tested this behavior using a small dataset of corrected long reads aligned with Minimap2, once using 1 thread and once using 128 threads. The resulting AlignQC-reported error rates were:
- 0.387% with 128 thread
- 0.447% with 1 threads
Best, simon.
Hello,
first of all: Thanks for this great tool, that really provides a comprehensive sets of metrics and with neat visualisations as well!
For the evaluation of the error rate AlignQC samples the best alignments from the first n reads such that n % 100 = 1 and the total alignment length reaches at least 1,000,000 bases. For datasets with an average read length over 2000 bases, this results in a sample of only about 501 reads. As a consequence, the reported error rates can be significantly influenced by the number of threads used during the mapping step. This is because higher-quality reads often map more quickly and therefore tend to appear earlier in the BAM file, leading to their overrepresentation in the sample AlignQC analyzes.
I tested this behavior using a small dataset of corrected long reads aligned with Minimap2, once using 1 thread and once using 128 threads. The resulting AlignQC-reported error rates were:
Best, simon.