Skip to content

to_numpy can be optimized #984

@kordejong

Description

@kordejong

@kraaijenbrink reported that obtaining a partitioned array cell's value using a from_numpy, where, sum combo is faster than using to_numpy. If this is correct, then to_numpy can be optimized.

Example code:

# Slow: full raster materialization per point/sample
value = lfr.to_numpy(array)[row, col]

# Faster workaround with current Python API
mask_np = np.zeros(array_shape, dtype=np.uint8)
mask_np[row, col] = 1
mask = lfr.from_numpy(mask_np, partition_shape=partition_shape)

value_future = lfr.sum(lfr.where(mask, array, 0.0)).future
value = value_future.get()  # defer this as late as possible

Options:

  • In to_numpy, each partition is waited upon, in turn. This can be improved by handling ready partitions as soon as possible.
  • Write partition data into the NumPy buffer in parallel.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions