-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Summary
When generating a template mmCIF from a PDB using MIToS (e.g. selecting a single chain, then writing with MMCIFFile / write_file), the resulting mmCIF can be syntactically valid but still fails in ColabFold/AlphaFold template mode with:
ValueError: mmCIF file ... is missing required field _chem_comp.id
Root cause
AlphaFold’s mmCIF parser (alphafold/data/mmcif_parsing.py) expects several PDBx/mmCIF categories that are commonly present in wwPDB mmCIF files, notably:
_chem_comp.idand_chem_comp.type(used to classify monomers as “peptide”)_entity_poly_seq.*(polymer sequence)_struct_asym.*(mapping between entity IDs and chain IDs)- plus header fields like
_entry.idand_exptl.method
MIToS' current mmCIF writing path (via BioStructures.MMCIFDict(residues; ...) and writemmcif) primarily emits _atom_site.* information. In addition, when starting from PDB input, residue identifiers often have empty PDBe numbering, which may lead to _atom_site.label_seq_id being written as "." rather than an integer. AlphaFold’s parser later casts label/auth sequence IDs to integers, so this can be another failure point after _chem_comp is addressed.
Expected behavior
It would be very helpful if MIToS documentation (or helper utilities) could:
- warn that “atom_site-only” mmCIFs may not be sufficient for AlphaFold/ColabFold templating, and/or
- provide a small utility to “upgrade” a minimal mmCIF to include the minimal additional PDBx categories required by AlphaFold.
Repro (typical)
- Start from a PDB
- Use MIToS to select a chain / rename residues
- Save as mmCIF using
MMCIFFile+write_file - Use that mmCIF as a ColabFold template → error above