-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Is your feature request related to a problem? Please describe.
A recurring issue seen with alchemical free energy calculations with SOMD2 is that occasionally trajectories terminate early due to a 'NaN' generated after an integration step. We have also seen cases of trajectories showing transient spikes in non-bonded energies that we would expect cause a numerical integration error.
Because of the stochastic nature and rare frequency of the issue it is difficult to isolate the source of the error.
Describe the solution you'd like
A 'debug' mode that enables buffering of coordinates and energies for the past N integration time-steps would be helpful. The code could be updated to write this information in molecular file formats to allow visualisation of the trajectory in the few steps immediately before a crash occurs.
Describe alternatives you've considered
This could be in principle implemented at the python API by adding extra logic to save/overwrite snapshots after every MD time-step. However this would likely be very slow and make it difficult to re-generate in a timely manner NaN crashes.
We could however buffer internally coordinates and forces and write them to disk only when a crash has been triggerred. There is already low-level logic in the code to attempt to deal with NaN errors by performing energy minimisation. Some compromise on speed (a few fold) would be acceptable for troubleshooting purposes.