-
Notifications
You must be signed in to change notification settings - Fork 23
Description
assembly-stats 1.0.1 produces the following wrong output on one of our assemblies:
sum = 3595598458, n = 896, ave = 4012944.71, largest = 1193142484
N50 = 844155996, n = 2
N60 = 836758171, n = 3
N70 = 836758171, n = 3
N80 = 762492267, n = 4
N90 = 762492267, n = 4
N100 = 18446744071783539901, n = 896
N_count = 178800
Gaps = 1788
Command: assembly-stats scaffolds.fa >scaffolds.stats. The machine is an rather ordinary Linux server.
Correct output from another tool:
Filepath TotSeqs TotLen N50 N75 N90 I50 GC Avg Min Max AuN
scaffolds.fa 896 7890565754 1193142484 836758171 729054925 891 44.80 8806434.99 17617 2368955581 1224440811
In particular, please that see assembly-stats shows total assembly length as 3,595,598,458, while it should be 7,890,565,754. Also note the N100 of 18446744071783539901.
Also, assembly-stats works fine on our other assemblies of similar total size, but consisting of smaller scaffolds.
Here is the test input: https://biokirr.com/Supporting-Data/assembly-stats-bug-report/scaffolds-N.fa.zstd - It's the same assembly filled with N, so it's only 682 kB compressed. (Even more compact in NAF format: https://biokirr.com/Supporting-Data/assembly-stats-bug-report/scaffolds-N.fa.naf - 125 kB). Decompressed size is 8 GB.
I guess there is some kind of integer overflow, so I hope it will be easy to fix. Please let me know if you need any other information, or the full repro script.
EDIT: Added test data.