Description
At the minute the GPU/differentiable path is Zygote-compatible and hence uses non-mutating broadcasted operations. This works, but is rather slow and very GPU memory-intensive.
Long term the plan is to switch to Enzyme-compatible GPU kernels to calculate and sum the forces using the neighbour list. This will be much faster both with and without gradients, and should help us move towards the speeds of existing MD software. These kernels could be used as part of the general interaction interface as is, or another interface could emerge to use them. Enzyme and Zygote can be used together, so it should be possible to replace the force summation alone and retain the functionality of the package.
One consideration is how general such kernels should be. A general pairwise force summation kernel for user-defined force functions would be useful for Lennard-Jones and Coulomb interactions, and hence would be sufficient for macromolecular simulation. Other more specialised multi-body kernels could live in Molly or elsewhere depending on how generic they are.
Another concern is how the neighbour list is best stored (calculation of the neighbour list can also be GPU accelerated but that is a somewhat separate issue).
Something to bear in mind is the extension from using one to multiple GPUs for the same simulation. It is probably best to start with one GPU and go from there.
This issue is to track and discuss this development. @leios
Useful links:
- A N-body Lennard-Jones simulator in Julia: https://gist.github.com/AlfTetzlaff/880ad7bf4e1486d8a051580a6262491c
- NVIDIA N-body discussion: https://developer.nvidia.com/gpugems/gpugems3/part-v-physics-simulation/chapter-31-fast-n-body-simulation-cuda
- CellListMap.jl: https://github.com/m3g/CellListMap.jl