Open
Description
📰 Custom Issue
There are currently problems with handling data that is too large for memory (#310, #246). One way this could be worked around is, instead of building a single regridder to handle the source and target grid/mesh, build many smaller regridders, each responsible for some section of the source and target grid/mesh. A Partition
object (name to be decided), could handle the building, saving and application of such regridders.
The Partition object would have the following functionality:
- Can be initiatedby passing a source grid/mesh, a target grid/mesh and some collection of indices which describes the source and target subsets.
- It may be possible in future to automatically determine an appropriate collection of subsets, however this shouldn't be necessary for a minimum viable solution.
- It may also be necessary to pass in explicit information about the dask chunking of the input object (and desired chunking for the output object).
- There should be some level of error checking to ensure that the partition makes sense. i.e. The entire source/target is covered, and for each pair of source/target indices, the source points cover the target points.
- It may also be worth checking that the calculation would be managable for the given chunking strategy (and at least raise a warning in such cases).
- There should be a method for generating regridders (a regridder from the 1st source indices to the 1st target indices, same for the 2nd etc.) and saving these regridders to a user supplied series of paths. Doing so will mean that only one regridder will need to be realised in memory at a time. The
Partition
object should be able to keep track of the paths to the appropriate regridders.- It should also be possible to give a
Partition
object access to the paths of previously saved regridders, via intialisation or some other method.
- It should also be possible to give a
- When this object has access to a full set of saved regridders, it is able to apply them in order to lazily regrid data from the source grid/mesh as if it were a regular regridder. When a chunk of data is realised, it will then load the appropriate regridders, perform regridding on that chunk and delete the regridder. In this way, only a limited number of regridders will need to be loaded into memory at any one time.
- One problem we may have to solve is figuring out a way to realise multiple chunks in the same vertical stack which will use the same regridder in such a way as to not have to load that regridder multiple times.
Metadata
Metadata
Assignees
Type
Projects
Status
No status