@kraaijenbrink reported that obtaining a partitioned array cell's value using a from_numpy, where, sum combo is faster than using to_numpy. If this is correct, then to_numpy can be optimized.
Example code:
# Slow: full raster materialization per point/sample
value = lfr.to_numpy(array)[row, col]
# Faster workaround with current Python API
mask_np = np.zeros(array_shape, dtype=np.uint8)
mask_np[row, col] = 1
mask = lfr.from_numpy(mask_np, partition_shape=partition_shape)
value_future = lfr.sum(lfr.where(mask, array, 0.0)).future
value = value_future.get() # defer this as late as possible
Options:
- In
to_numpy, each partition is waited upon, in turn. This can be improved by handling ready partitions as soon as possible.
- Write partition data into the NumPy buffer in parallel.
@kraaijenbrink reported that obtaining a partitioned array cell's value using a
from_numpy,where,sumcombo is faster than usingto_numpy. If this is correct, thento_numpycan be optimized.Example code:
Options:
to_numpy, each partition is waited upon, in turn. This can be improved by handling ready partitions as soon as possible.