Skip to content

read_pdb: Trailing whitespace is not removed in column residue_name #32

@kamurani

Description

@kamurani

I have encountered an issue when reading in a protein PDB file where whitespace is not effectively removed.

Source: Q05655

Using the following code:

pdb = read_pdb(pdb_file="AF-Q05655-F1-model_v2.pdb", category_names=['_atom_site'])  # We use '_atom_site' here to mirror the mmCIF format and it is the default
atoms_df = pdb['_atom_site']

# Get values for residue_name
list(atoms_df.residue_name.unique())

This yields:

['MET ',
 'ALA ',
 'PRO ',
 'PHE ',
 'LEU ',
 'ARG ',
 'ILE ',
 'ASN ',
 'SER ',
 'TYR ',
 'GLU ',
 'GLY ',
 'GLN ',
 'ASP ',
 'CYS ',
 'VAL ',
 'LYS ',
 'THR ',
 'TRP ',
 'HIS ']

This whitespace should be trimmed so that filtering can take place properly.

Happy to submit a PR for this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions