-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Thanks for providing the python script to convert .dat files to FASTA. I did, however, find a bug when trying to convert an old UniProt database. There's an assumption in uniprot-dat-to-fasta.py that any line split resulting in only two characters must be a line with a tag. This actually has the side effect of causing any two amino acid sequences (of which there are some) to create errors, as well as truncating large numbers of other sequences.
A fix is to keep a copy of the original line read from the line without stripping off any whitespace, and checking to see if that line starts with 5 spaces. According to the .dat file format, this is something else that distinguishes a line with sequence from lines with tags. There is probably a faster solution but that's what worked for me.