Skip to content

Regridder Partitioning #427

Open
Open
@stephenworsley

Description

@stephenworsley

📰 Custom Issue

There are currently problems with handling data that is too large for memory (#310, #246). One way this could be worked around is, instead of building a single regridder to handle the source and target grid/mesh, build many smaller regridders, each responsible for some section of the source and target grid/mesh. A Partition object (name to be decided), could handle the building, saving and application of such regridders.

The Partition object would have the following functionality:

  • Can be initiatedby passing a source grid/mesh, a target grid/mesh and some collection of indices which describes the source and target subsets.
    • It may be possible in future to automatically determine an appropriate collection of subsets, however this shouldn't be necessary for a minimum viable solution.
    • It may also be necessary to pass in explicit information about the dask chunking of the input object (and desired chunking for the output object).
  • There should be some level of error checking to ensure that the partition makes sense. i.e. The entire source/target is covered, and for each pair of source/target indices, the source points cover the target points.
    • It may also be worth checking that the calculation would be managable for the given chunking strategy (and at least raise a warning in such cases).
  • There should be a method for generating regridders (a regridder from the 1st source indices to the 1st target indices, same for the 2nd etc.) and saving these regridders to a user supplied series of paths. Doing so will mean that only one regridder will need to be realised in memory at a time. The Partition object should be able to keep track of the paths to the appropriate regridders.
    • It should also be possible to give a Partition object access to the paths of previously saved regridders, via intialisation or some other method.
  • When this object has access to a full set of saved regridders, it is able to apply them in order to lazily regrid data from the source grid/mesh as if it were a regular regridder. When a chunk of data is realised, it will then load the appropriate regridders, perform regridding on that chunk and delete the regridder. In this way, only a limited number of regridders will need to be loaded into memory at any one time.
    • One problem we may have to solve is figuring out a way to realise multiple chunks in the same vertical stack which will use the same regridder in such a way as to not have to load that regridder multiple times.

Metadata

Metadata

Assignees

No one assigned

    Labels

    New: IssueHighlight a new community raised "generic" issue

    Type

    No type

    Projects

    • Status

      No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions