Bug in uniprot-dat-to-fasta.py

Thanks for providing the python script to convert .dat files to FASTA.  I did, however, find a bug when trying to convert an old UniProt database.  There's an assumption in uniprot-dat-to-fasta.py that any line split resulting in only two characters must be a line with a tag.  This actually has the side effect of causing any two amino acid sequences (of which there are some) to create errors, as well as truncating large numbers of other sequences.
A fix is to keep a copy of the original line read from the line without stripping off any whitespace, and checking to see if that line starts with 5 spaces.  According to the .dat file format, this is something else that distinguishes a line with sequence from lines with tags.  There is probably a faster solution but that's what worked for me. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in uniprot-dat-to-fasta.py #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug in uniprot-dat-to-fasta.py #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions