-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hi there,
We want to report an issue with a PanGraph that we generated on a dataset representing 1000 E. coli sequences. We believe that 64 of these sequences are not represented correctly by the PanGraph.
Thankfully, since we think the sequence lengths are also wrong, we manually verified the issue by simply computing the lengths of one of the mismatching sequences. We did this by adding up the lengths of the consensus sequences of the blocks on its path and adding the lengths of the insertions in the sequences and subtracting the lengths of the deletions on the path.
We find that the sequence length of the sequence ‘NZ_AP019856.1’ is computed by the PanGraph to be 4800017 bases. However, its true length is 4800098 bases.
We have uploaded the three relevant files to the following folder: https://drive.google.com/drive/folders/1JAliSaWokYX2i5KaUjQiOPnCdL_uyZqG?usp=sharing
We believe the mismatching sequences are: NZ_AP019856.1, NZ_CP054407.1, NZ_CP010219.1, NZ_CP036202.1, NZ_CP014583.1, NZ_CP027587.1, NZ_CP027325.1, NZ_CP013029.1, NZ_CP027459.1, NZ_CP050865.1, NZ_CP050862.1, NZ_CP027534.1, NZ_CP014316.1, NZ_CP015085.1, NZ_CP018970.1, NZ_CP023826.1, NZ_CP032201.1, NZ_CP023844.1, NZ_CP015138.1, NZ_CP018983.1, NZ_CP018991.1, NZ_CP049077.2, NZ_CP010876.1, NZ_CP036245.1, NZ_CP049085.2, NZ_CP035476.1, NZ_CP035477.1, NZ_CP014522.1, NZ_CP014495.1, NZ_CP024720.1, NZ_CP024717.1, NZ_CP021207.1, NZ_CP019008.1, NZ_CP019020.1, NZ_CP035498.1, NZ_CP053245.1, NZ_CP037449.1, NZ_CP048304.1, NZ_CP048920.1, NZ_CP040456.1, NZ_CP024886.1, NZ_CP051700.1, NZ_CP030111.1, NZ_AP022650.1, NZ_CP053251.2, NZ_CP051688.1, NZ_CP033762.1, NZ_CP019273.1, NZ_AP017610.1, NZ_CP033850.1, NZ_CP019029.1, NZ_CP015834.1, NZ_CP009859.1, NZ_CP040919.1, NZ_CP023366.1, NZ_CP041300.1, NZ_CP033605.1, NZ_CP041452.1, NZ_CP041448.1, NZ_CP028166.1, NZ_AP021896.1, NZ_CP031833.1
Thanks,
Harsh