Skip to content

Support importing (some) virtual sites in Interchange.from_openmm #1081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
May 13, 2025

Conversation

mattwthompson
Copy link
Member

@mattwthompson mattwthompson commented Oct 21, 2024

Description

Resolves #1063

@mattwthompson mattwthompson added this to the 0.4.2 milestone Oct 21, 2024
Copy link

codecov bot commented Oct 23, 2024

Codecov Report

Attention: Patch coverage is 93.83886% with 13 lines in your changes missing coverage. Please review.

Project coverage is 93.72%. Comparing base (4da84f0) to head (c9c46bd).
Report is 22 commits behind head on main.

Files with missing lines Patch % Lines
...enff/interchange/interop/openmm/_import/_import.py 90.90% 4 Missing ⚠️
openff/interchange/components/toolkit.py 88.46% 3 Missing ⚠️
openff/interchange/drivers/report.py 50.00% 2 Missing ⚠️
openff/interchange/interop/_virtual_sites.py 92.59% 2 Missing ⚠️
...terchange/interop/openmm/_import/_virtual_sites.py 96.55% 1 Missing ⚠️
openff/interchange/smirnoff/_gromacs.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1081      +/-   ##
==========================================
+ Coverage   93.66%   93.72%   +0.05%     
==========================================
  Files          71       72       +1     
  Lines        6095     6216     +121     
==========================================
+ Hits         5709     5826     +117     
- Misses        386      390       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mattwthompson mattwthompson marked this pull request as ready for review October 24, 2024 17:21
Copy link
Collaborator

@Yoshanuikabundi Yoshanuikabundi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this on a hand-built OpenMM Simulation featuring a single water molecule with made up parameters, and also on a single Tip4P molecule. Things mostly worked, and errors were mostly clear. I do have a few notes:

  • export INTERCHANGE_EXPERIMENTAL=1 seems to be required, but this does not seem to be documented and the error message doesn't make this clear. This was the only time I was confused by an error message.
  • At the moment, the number of atoms in the imported topology must match the number of particles in the system. I think we should consider if topologies matching the number of real atoms in the system should be accepted as well. This would allow OpenFF Topology objects to work when importing systems with supported virtual sites, as they currently do for systems without virtual sites. This makes particular sense given that it appears that virtual sites are stripped out of the OpenFF topology that's available from interchange.topology.
  • The exported OpenMM topology via to_openmm_simulation() had its virtual sites in a separate chain and residue - I think it'd be better if they were in the same chain and residue as the parent atom.
  • This is unrelated to the PR, but I tried a hand-built system without bonds (single water molecule with constraints, but no bonds force), and Interchange complained:
    136 # TODO: Does this run through the Interchange.box validator?
    137 interchange.box = _box_vectors
--> 139 if interchange.topology.n_bonds > len(interchange.collections["Bonds"].key_map):
    140     # There are probably missing (physics) bonds from rigid waters. The topological
    141     # bonds are probably processed correctly.
    142     _fill_in_rigid_water_bonds(interchange)
    144 return interchange

KeyError: 'Bonds'

Since this code currently attempts to account for bond parameters missing from the bond parameters, maybe we should account for this possibility?

A full review for such a big patch is gonna take me some time to work through, especially since I'm gonna need to get my head around how everything works, but those are my notes from some independent testing.

But yeah this is great! Things seem to be working pretty robustly once you come up with a system and topology that works - I was able to swap the topology of simple molecules out for a "real" openff topology after the Interchange was constructed and things kept working. Round trips work, energy minimization works from Interchange land, and visualizations work perfectly.

Comment on lines +190 to +200
topology._molecule_virtual_site_map = defaultdict(list)

# TODO: This iteration strategy scales horribly with system size - need to refactor -
# since looking up topology atom indices is slow. It's probably repetitive to
# look up the molecule index over and over again
for particle in virtual_sites:
molecule_index = topology.molecule_index(topology.atom(particle.orientation_atom_indices[0]).molecule)

topology._molecule_virtual_site_map[molecule_index].append(particle)

topology._particle_map = openmm_openff_particle_map
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be reasonable to squish topology._particle_map and topology._molecule_virtual_site_map into atom metadata rather than monkey patching? I think the answer is probably "no", and I see this information is moved to the Interchange, and I've tested that Interchange.topology seems to be able to be swapped out for a true topology (with full molecular graphs etc).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly; I don't love the current design but it has the benefit of not interacting with anything the user might do in the public API

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, I think you could make the return type of this function something like -> tuple[Topology, defaultdict, dict], make these attributes on Interchange that get set in the parent function from that return value, and then pass them in as additional arguments to _convert_vsites(). I think that covers all the information flow needed? Just need to remove some .topologys from a few other places.

@mattwthompson
Copy link
Member Author

Thanks for the review

export INTERCHANGE_EXPERIMENTAL=1 seems to be required

This shouldn't be the case in the 0.4.x line, possibly your local setup is out of date. I've removed some obsolete stuff in tests now that Interchange.from_openmm is not "experimental."

I think we should consider if topologies matching the number of real atoms in the system should be accepted as well

Agree, but something for future work. I want to get this functionality over the finish line in its current scope as soon as I can

virtual sites in a separate chain and residue

Right, this was the result of #1051

we should account for this possibility

Probably yeah, topological edges being defined different ways is pretty frustrating to work with

Co-authored-by: Josh A. Mitchell <[email protected]>
@mattwthompson
Copy link
Member Author

I think this is ready for another pass;

My hope is that this is more or less good to go in its current state, but if there are more changes needed before it finds a way into a release that's fine. OpenFE is a key stakeholder for this feature but I don't think there's significant time pressure from them.

I think everything else is either out of the original scope or something I forgot!

Copy link
Collaborator

@Yoshanuikabundi Yoshanuikabundi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an enormous amount of code that lays the basis for a lot of future work and the fact that it works reflects a pretty amazing effort!

I checked this with a notebook I wrote. First, it creates a single "water" molecule topology and system by hand and imports that, then round trip exports it. Then I created a small box of OPC water (because it uses a 3 particle average virtual site) with Modeller, imported that, and roundtripped it. Notebook here - just rename to .ipynb: openmm_vsite_import.txt
Here's what I found:

  • The modeller-created system worked perfectly and was easy to load, visualize, and export. This is probably the most fair and relevant test, and I imagine should represent the vast majority of OpenMM system use.
  • My hand-built system imported and visualized fine, but when I roundtripped it back to OpenMM the resulting system didn't have any particles. I wonder if this had something to do with the fact that it didn't have any forces. This was really not a fair test or something anyone would ever do, but hey if that's worth fixing then there it is.
  • The topology argument of Interchange.from_openmm is optional (has a default value, has None as a possible type, and is described as optional in the docstring), but a value of None unconditionally raises an error - this should just be a required argument. Not a blocker.
  • An OpenFF topology can't be used, because it can't contain vsites and therefore has the wrong number of atoms - I mentioned this in my previous review and I think it got discussed as if I was talking about positions, but specifically I think that a topology that matches the number of real atoms should be accepted. This allows users to include complete chemical information instead of leaning on _SimpleMolecules etc. Not a blocker here but definitely a nice to have.

Notes from looking at the code:

  • LocalCoordinatesSite is in _SUPPORTED_VIRTUAL_SITE_TYPES, but is not supported - I've added a note on the code below
  • I have some notes on the docs page
  • Users can currently provide OpenFF topologies when importing OpenMM topologies - I've used this in vignettes to demonstrate how users can take advantage of having full chemical information even when constructing an interchange from OpenMM. We should not regress this by having OpenFF topologies raise an AttributeError, even if we do not support this use case for systems with virtual sites. More details below.

All that said, I think this is just about ready to go and most of the above feedback can be addressed in future PRs. The three blockers I've identified so far are the suggestions I made to edges.md, removing LocalCoordinatesSite from _SUPPORTED_VIRTUAL_SITE_TYPES, and addressing the regression with OpenFF topologies.

I haven't looked in great detail at the guts of the collection or export code, and I'm happy to do that tomorrow - my eyes are just bouncing off it right now and its the end of my day. So I'll finish this off tomorrow - I expect to approve it, but such a detailed PR deserves a full review.

Comment on lines +76 to +77
topology._molecule_virtual_site_map = defaultdict(list)
topology._particle_map = {index: index for index in range(topology.n_atoms)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the fact that this is now necessary a regression? from_openmm used to support giving an OpenFF topology so that users could specify/record the precise chemical identity of stuff, even when it came from OpenMM-land. It seems bad that they now need to invoke this arcana for that to work or else get a cryptic message - I think topologies that lack these attributes should continue to work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to make this change - even though it's probably a regression, this behavior was never tested. I'd like to come back to it and add tests / fix anything that this breaks. And I can hardly add more to this PR in its current state so I must do it later

#1218

Comment on lines +190 to +200
topology._molecule_virtual_site_map = defaultdict(list)

# TODO: This iteration strategy scales horribly with system size - need to refactor -
# since looking up topology atom indices is slow. It's probably repetitive to
# look up the molecule index over and over again
for particle in virtual_sites:
molecule_index = topology.molecule_index(topology.atom(particle.orientation_atom_indices[0]).molecule)

topology._molecule_virtual_site_map[molecule_index].append(particle)

topology._particle_map = openmm_openff_particle_map
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, I think you could make the return type of this function something like -> tuple[Topology, defaultdict, dict], make these attributes on Interchange that get set in the parent function from that return value, and then pass them in as additional arguments to _convert_vsites(). I think that covers all the information flow needed? Just need to remove some .topologys from a few other places.

Copy link
Collaborator

@Yoshanuikabundi Yoshanuikabundi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal code all looks good! I have suggested revisions, mostly to user-facing code and docs, but only the three points I raised in my previous comments are what I'd consider blocking, and I don't think any of them should need another round of review. Thanks for this feature, it opens the way for exciting things!

@mattwthompson mattwthompson merged commit b8e929a into main May 13, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support of virtual sites for creating an Interchange from_openmm
2 participants