Skip to content

Incorrect E. coli sequences being represented by PanGraph (large dataset) #68

@TheHarshShow

Description

@TheHarshShow

Hi there,

We want to report an issue with a PanGraph that we generated on a dataset representing 1000 E. coli sequences. We believe that 64 of these sequences are not represented correctly by the PanGraph.

Thankfully, since we think the sequence lengths are also wrong, we manually verified the issue by simply computing the lengths of one of the mismatching sequences. We did this by adding up the lengths of the consensus sequences of the blocks on its path and adding the lengths of the insertions in the sequences and subtracting the lengths of the deletions on the path.

We find that the sequence length of the sequence ‘NZ_AP019856.1’ is computed by the PanGraph to be 4800017 bases. However, its true length is 4800098 bases.

We have uploaded the three relevant files to the following folder: https://drive.google.com/drive/folders/1JAliSaWokYX2i5KaUjQiOPnCdL_uyZqG?usp=sharing

We believe the mismatching sequences are: NZ_AP019856.1, NZ_CP054407.1, NZ_CP010219.1, NZ_CP036202.1, NZ_CP014583.1, NZ_CP027587.1, NZ_CP027325.1, NZ_CP013029.1, NZ_CP027459.1, NZ_CP050865.1, NZ_CP050862.1, NZ_CP027534.1, NZ_CP014316.1, NZ_CP015085.1, NZ_CP018970.1, NZ_CP023826.1, NZ_CP032201.1, NZ_CP023844.1, NZ_CP015138.1, NZ_CP018983.1, NZ_CP018991.1, NZ_CP049077.2, NZ_CP010876.1, NZ_CP036245.1, NZ_CP049085.2, NZ_CP035476.1, NZ_CP035477.1, NZ_CP014522.1, NZ_CP014495.1, NZ_CP024720.1, NZ_CP024717.1, NZ_CP021207.1, NZ_CP019008.1, NZ_CP019020.1, NZ_CP035498.1, NZ_CP053245.1, NZ_CP037449.1, NZ_CP048304.1, NZ_CP048920.1, NZ_CP040456.1, NZ_CP024886.1, NZ_CP051700.1, NZ_CP030111.1, NZ_AP022650.1, NZ_CP053251.2, NZ_CP051688.1, NZ_CP033762.1, NZ_CP019273.1, NZ_AP017610.1, NZ_CP033850.1, NZ_CP019029.1, NZ_CP015834.1, NZ_CP009859.1, NZ_CP040919.1, NZ_CP023366.1, NZ_CP041300.1, NZ_CP033605.1, NZ_CP041452.1, NZ_CP041448.1, NZ_CP028166.1, NZ_AP021896.1, NZ_CP031833.1

Thanks,
Harsh

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions