Integrating inline ML emulators with Fortran E3SM #7545
Replies: 8 comments 19 replies
-
|
To start the discussion, a core dump of what I know. I am just an old scientist, so may be horribly uninformed. Please correct me and add.
Examples I know of and some comments: pytorch-fortran: fortran wrapper for reading torch script files, but it has not been updated in a while Ftorch (Cambridge): actively been developed now and for ESMs (there is now an implementation in CESM [FIATS])https://github.com/berkeleylab/fiats) (Berkeley): this can do training in fortran (can it read training from other analyses like pyTorch)? Do people know of other methods? Who would like to share what they are doing? Here is what I have learned: What else is going on? What methods are people using? Can we develop a pro-con and some broad consensus on a standard way forward? For Inference, is FTorch the way to go? (Training could be done with FIATS) |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for starting the discussion @andrewgettelman My objective is to train NN's to emulate some of our physics processes in ELM-FATES. Photosynthesis (for example) is our most expensive calculation, so I'd like to see if I can speed up our model by having an alternative NN solve. So far I have used PyTorch to train a few different network architectures. In this vignette I:
The step I'm currently working on is to export the trained model from pytorch, and then import it into the ELM-FATES for the inference (forward steps). @rouson and I plan to intercompare ftorch and fiats. |
Beta Was this translation helpful? Give feedback.
-
|
EAMxx is not Fortran, but just to add a data point, we are planning to add ML emulators to EAMxx, and we our planned pipeline will be to use models saved as torch scripts. We are planning to support embedding pyTorch models as well as C++ models generated by "translating" a torch script file to a Kokkos equivalent implementation (via the LAPIS package). So, nothing to share on our end for the Fortran-torch conversation. I just wanted to share the fact that also on the C++ atm impl we are looking into using (py)torch. |
Beta Was this translation helpful? Give feedback.
-
|
@ambrad made the good point that Ftorch is a wrapper to libtorch, which provides a C++ API. So you only have to build against one thing (libtorch) and both the C++ and Fortran parts of the model could use ML pieces. |
Beta Was this translation helpful? Give feedback.
-
|
Is there any chance we could move this to a call? There's a lot of ground to cover and I think that might be more efficient than recapitulating everything that my collaborators and I have written elsewhere. If anyone is interested, just click like on this comment and I'll reach out to schedule. We have several papers that include comparisons between three all-Fortran solutions and three Fortran APIs with C++ back ends, including FTorch. The introduction to our recent workshop paper [1] covers this and includes the use of Fiats for inference with a neural network trained in PyTorch and exported via nexport. Additional examples related to training and inference in the context of atmospheric simulations are in a Jupyter notebook that will appear in conference proceedings soon [2]. Finally, there's a Journal of Open Source Software paper in review [3]. |
Beta Was this translation helpful? Give feedback.
-
|
Re: Fortran standards. As a practical matter, we are limited to whatever is supported by the union of the default compilers used across our supported production and test machines. |
Beta Was this translation helpful? Give feedback.
-
|
Okay, I've almost got an FTorch example with EAM running, but could use a bit of help. So I will ask generally for those that have worked with FTorch (especially hoping @jonbob, @rgknox and/or @andrewdnolan can comment). It seems that I am not able to pass data correctly between the FTorch fortran and the libtorch C++ code. When trying to pass the device name/number (torch_kCPU) it gets corrupted then the libtorch code dies. Seems like I have not compiled FTorch correctly. Would anyone be willing to share how they did it? What I did was this: But I had to update the definition of to What did I not do correctly here? |
Beta Was this translation helpful? Give feedback.
-
|
Just an update. With a bunch of help (from @singhbalwinder , @mahf708 , @rgknox , Olawale and ChatGPT) I was able to get FTorch running in EAM with the latest E3SM master. https://github.com/andrewgettelman/E3SM/tree/SimpleNet_interim_v2 This tag has a version of the FTorch 'SimpleNet' installed inside of the P3 microphysics in EAM. It loads a model in the init step, then calls it at run time. It's for GNU and CPU only right now (that was how I was able everything to build and link). FTorch seems to be a bit dependent on versions and libraries, but if I can eventually solve it, someone who knows what they are doing can probably get a more robust solution. I will integrate this with the warm rain networks for cloud microphysics in E3SM, and then we should be able to make a network available for Fortran and C++ code as discussed with @agsalin |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Several groups have implementations and are developing implementations to use machine learning emulators (neural networks) in the Fortran E3SM. The E3SM project is focusing on the C++ code bases for doing emulation. There is a need however for projects using Fortran E3SM to be able to add ML components. Ideally these could be added in a format that would also be able to be used by the C++ code (same emulator, but with a C++ interface).
What are people doing in this regard?
There has been some discussion about ongoing efforts, and this discussion is a central place to be able to collect and share information on what is happening.
Beta Was this translation helpful? Give feedback.
All reactions