I have encountered an issue when reading in a protein PDB file where whitespace is not effectively removed.
Source: Q05655
Using the following code:
pdb = read_pdb(pdb_file="AF-Q05655-F1-model_v2.pdb", category_names=['_atom_site']) # We use '_atom_site' here to mirror the mmCIF format and it is the default
atoms_df = pdb['_atom_site']
# Get values for residue_name
list(atoms_df.residue_name.unique())
This yields:
['MET ',
'ALA ',
'PRO ',
'PHE ',
'LEU ',
'ARG ',
'ILE ',
'ASN ',
'SER ',
'TYR ',
'GLU ',
'GLY ',
'GLN ',
'ASP ',
'CYS ',
'VAL ',
'LYS ',
'THR ',
'TRP ',
'HIS ']
This whitespace should be trimmed so that filtering can take place properly.
Happy to submit a PR for this.