Fix several latent bugs in MSA, ranking, and structure-context code#420
Open
mooreneural wants to merge 1 commit into
Open
Fix several latent bugs in MSA, ranking, and structure-context code#420mooreneural wants to merge 1 commit into
mooreneural wants to merge 1 commit into
Conversation
- colabfold.py: move error_count out of retry loop in submit/status so the retry cap actually triggers (was reset to 0 every iteration, making `if error_count > 5: raise` unreachable) - all_atom_structure_context.py: offset token_backbone_frame_index by atom_offsets (it stores atom indices), not token_offsets, when merging chains - token_dist_restraint.py: remove stray extra brackets around key_entity_types tensor that gave it shape (1, n) instead of (n,) - clashes.py: use ellipsis-form rearrange pattern for atoms_per_chain consistently across both ratio checks - msa.py: replace in-place mutation of input msa_sequence_source with masked_fill
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes 5 small but real bugs found while reviewing the data pipeline, ranking, and feature-generator code. Each is a one- to four-line change.
Fixes
colabfold.py—error_count = 0was inside thewhile Trueretry loop insubmit()andstatus(), so the counter reset every iteration andif error_count > 5: raisewas unreachable. The submit/status helpers would retry forever on persistent server errors. Moved the initialization above the loop. (download()was already correct.)all_atom_structure_context.py— InAllAtomStructureContext.merge,token_backbone_frame_index(which stores atom indices, seebackbone_atoms_indicesindata/dataset/structure/utils.py) was being offset bytoken_offsetsrather thanatom_offsetswhen merging multiple chains. The sibling fieldstoken_centre_atom_indexandtoken_ref_atom_indexin the same function correctly useatom_offsets. Currently latent because the returnedframe_idcesfromget_frames_and_maskis discarded bychai1.py, but it's still a real semantic bug.token_dist_restraint.py— Stray extra[ ... ]around thekey_entity_typestensor literal gave it shape(1, n)instead of(n,), asymmetric withquery_entity_typesdefined right above it. Currently unused at the call site, but trivially wrong.ranking/clashes.py— Inhas_inter_chain_clashes, the two ratio checks used inconsistent einops patterns:"... c -> ... c 1"(ellipsis-form) on one line and"b c -> b 1 c"(hardcodedb) on the next. Functionally equivalent for the 2D batch input used today, but the latter will fail if batch dims ever change. Made both ellipsis-form.features/generators/msa.py—MSADataSourceGenerator._generatemutated the inputmsa_sequence_sourcetensor in place (msa_sequence_source[msa_sequence_source.eq(query)] = none) before reassigning it via the non-inplacemasked_fillon the next line. Idempotent so it doesn't cause incorrect output today, but fragile — replaced withmasked_fillfor consistency.Test plan
python -m py_compileon all modified files (passes locally)tests/test_inference_dataset.py::test_protein_with_smilesexercises the multi-chainmerge()code path and still passeskey_entity_typesnow has shape(n,)matchingquery_entity_types