-
Notifications
You must be signed in to change notification settings - Fork 135
Description
Related Issue #231
Background
Current Model approach to working with SPECFEM Fortran models is to ingest as NumPy arrays and perform model perturbations with vector multiplication. This works for moderate-sized models (~100k parameters), but quickly becomes a bottleneck for larger models (global models, anisotropic models etc.). Huge bottlenecks occur at loading models into RAM all at once (requiring GB of RAM), and performing vector manipulations such as multiplying vectors, taking their norm etc.
Approach
To optimize this we will stop addressing entire models as a single vector, and instead loop over available binary files which are already split up by NPROC and parameter by SPECFEM. In most cases we do not need access to the entire model vector at once so this approach should work.
This also opens up the ability to parallelize (concurrent.futures) this process over multiple processors to speed up the work and allow model manipulation to be done on compute nodes. This should hopefully also allow further separation of compute-heavy tasks from simple file-management.
Tasks to complete:
- Move compute-heavy model manipulation tasks into the Model class. Some of this is still contained in the Optimization module, including
- Optimize.save_vector()
- Optimize.load_vector()
- Optimize.get_stats()
- Remove any leftover storage of Model files as NumPy vectors
- Model.setup() do not read in files immediately, but simply store path to files
- Model class remove any internal storage of parameters on RAM.
- Model class built-in functions for manipulation of two Models (e.g., dot, multiple, divide). Internally these functions will be optimized for looping and parallelization
- Determine where model manipulation tasks can be included into System.run() functions, or written into new System.run() functions