Refactor blockwise coregistration method #707

adebardo · 2025-03-27T15:57:26Z

(Edit from @rhugonnet to add a description)

This PR refactors BlockwiseCoreg, rewritting most of the code to be shorter and changing the behaviour for the following:

The functionality now supports out-of-memory use through Multiprocessing,
The chunk size of the subdivision can be variable in X/Y,
Still only works for Nuth and Kääb,
The apply step now fits a 2D plane independently for X, Y and Z shifts output, using a RANSAC for filtering,
The re-gridding of the apply step now uses grid() from GeoUtils instead of the old warp_dem function, which removes the artefacts from Fix BlockwiseCoreg errors #584.

Resolves #584
Resolves #645
Resolves #697
Resolves #601
Resolves #698

Original message

Lots of work to do but we will use this PR to debate tomorrow

adebardo · 2025-03-28T12:41:59Z

rhugonnet · 2025-03-29T16:40:29Z

Great! 🙂

I went through all the code as we mentioned in our discussion yesterday. All good on the functionality. In terms of the structure and consistency with the rest of the package, here are my remarks (difficulty level of the change between parenthesis):

(Easy) As mentioned in the call, we should likely add plane_fit_ransac as an option of apply(), keeping the behaviour consistent with BlockwiseCoreg otherwise,
(Easy-Moderate) I don't understand why reference_dem and to_be_aligned_dem need to be __init__. The tiling_grid could be defined during fit() as it only relates to reference_dem (and not the coregistration method itself), and the profile saved there too?
(Moderate) After reviewing the functions, it would be a bit of effort but not as difficult as I anticipated to integrate these changes into BlockwiseCoreg directly, and trigger the out-of-memory behaviour by passing a MultiprocConfig to fit() and apply().

For 3.: Why? I realized we actually don't need to mirror the full complexity of #525 for BlockwiseCoreg. This is thanks to BlockwiseCoreg having its own fit()/apply() function independent of Coreg.fit()/apply().
As BlockwiseCoreg splits the input array very early on (and that once we have blocks, we don't need any memory considerations), we don't need as many complex out-of-memory processing steps during the rest of the coregistration, as we need for the generic Coreg.fit()/apply() steps in #525. For instance, ignoring the preprocessing, coreg.NuthKaab() still requires out-of-memory subsampling, then out-of-memory interpolation that re-runs at each iteration of the algorithm depending on the shift iteration.
Here, we'd only need two simple, one-time, out-of-memory steps to work in the very beginning, and that's it! Those are:

Out-of-memory reprojection (almost done in Multiprocessing reprojection geoutils#661),
Out-of-memory masking to define inlier_data (that we could run with map_overlap_multiproc of Generic Multiprocessing Functions for Raster Processing geoutils#669, or a very close version).

In short: We could pass down the MultiprocConfig to the preprocess_fit/apply functions, and define filenames in MultiprocConfig for these two "break points" (that will create a file for the "reprojected_dem" and "inlier_mask"). Then we just continue the rest of the coregistration with these two files!

So, in short, I think 1-2. shouldn't be a problem.
For 3., it's a bit more effort but not as much as initially thought, and would be ideal to avoid splitting into two classes. I'm happy to help out on this to finalize it in time 😉

Finally, I also have one question: Is there any core difference between our interp_points() and resample_dem()?
We have a grid resample wrapper based on interp_points() (tested to match exactly GDAL.warp in a same CRS), which would be easy to run with Multiprocessing:

xdem/xdem/coreg/base.py

Line 1306 in 6e9db15

def _reproject_horizontal_shift_samecrs(

We generally prefer to use this interp_points() implementation for consistency due to:

Its integrated nodata propagation scheme depending on the resampling, to avoid unexpected outliers,
Its understanding of pixel interpretation (center or corner of pixel), to avoid unexpected shifts,
It is the resamping used by the coregistration algorithms, so it ensures the resulting DEM alignment is fully consistent (otherwise, with a different resampling here, if we re-ran the coregistration algorithm, we would likely find another shift again instead of (0,0,0)).

Some issues I mention above sound like the ones you described in the call, so this might part of the solution.
In terms of performance of our resampling: The default "bilinear" sampling relies on scipy.ndimage.map_coordinates (to harness the regular-grid nature of a raster = X and Y axis have equal spacing), and therefore is much faster than Xarray-SciPy's interpn (that works on a rectilinear-grid = X and Y can have any coordinates).
It's likely not as fast as cars-resample, but shouldn't be too far off 😉

rhugonnet · 2025-04-03T17:05:47Z

@adebardo I thought about the apply step again.

Performing the resampling we talked about using map_overlap (more likely a modified version resembling the original implementation of reproject_multiproc from @vschaffn that gets the right shifted block with overlap) coupled with reproject_samecrs() should work well.
My only worry is that we might get tiling artefact at the boundary of tiles. Having smaller tiles would solve this, but then the Nuth and Kääb algorithm would not have enough points to converge, so we'd be blocked.

Thinking ahead on this, I see two solutions:

We could run apply() on a different (smaller) tiling than fit(), which solves the issue above,
Otherwise, we could run it on every point to avoid any deformation at all: We apply the 2D RANSAC plan function of X/Y shifts to every pixel of the DEM. This wouldn't yield a grid anymore, but a point cloud, that we need to re-grid. We could do this easily out-of-memory by combining the same map_overlap function as above with GeoUtils' PointCloud.grid() function: https://github.com/GlacioHack/geoutils/blob/1bbfddde0aa36415fa9d75a8239652c1500e877d/geoutils/interface/gridding.py#L31, which is already what we do in apply_matrix() for large rotations here:

xdem/xdem/coreg/base.py

Line 1277 in 6e9db15

new_dem = _grid_pointcloud(

So I think we should have it covered in any case 😉

adebardo · 2025-04-16T14:46:00Z

Hi @rhugonnet , I know it's not perfect, but could you take a look? I imagine there will be quite a few changes, and I'd like to get everything aligned before I properly finish the tests.

rhugonnet · 2025-04-16T19:20:32Z

Looking good! Shorter and better code 😄

I don't have any generic comment, so will just add a couple line-by-line!

xdem/coreg/blockwise.py

rhugonnet · 2025-04-16T20:07:30Z

It's great, thanks a lot for all the work!
Much more concise, readable and efficient code 🙂

I have an idea that might solve the efforts you showed to optimize memory usage (in case you don't know the I/O behaviour with chunks, if you know, sorry for explaining again!):
There are chunks "on disk", which are different from chunks "in memory". On file (TIF, COG), a raster is written with certain chunk sizes defined in the file, let's say (300, 300). If you then read a window with Rasterio, let's say (20, 20), it's actually the full (300, 300) chunk that gets loaded, before subsetting to (20, 20) later.
So, if you match your disk chunks in memory (300, 300), and you use a non-zero depth with map_overlap, you'll load 9 tiles (the central one + 8 surrounding) of (300, 300) all at once just to compute the overlap. That's a big memory peak.

In short, you want to have two things: 1/ Use memory chunks that are a large multiple (x4-x10) of your disk chunks, and 2/ Ensure your disk chunks aren't too big (many rasters only have 4 chunks, so almost everything gets loaded all the time, no matter the memory chunks).
Sometimes you need to re-encode the disk chunks (rewrite the TIF/COG) to your desired sizes to get the optimal memory usage. I do this in GeoUtils' out-of-memory test (for netCDFs, but it's similar for TIF/COG) here: https://github.com/GlacioHack/geoutils/blob/10519e99f9f8ec8ee76a72ac9702e2eac8467d90/tests/test_raster/test_distributing_computing/test_delayed_dask.py#L273
Maybe that helps for the memory tests 😉

xdem/coreg/blockwise.py

rhugonnet · 2025-04-28T12:05:10Z

Looking good, not much to say on the code side!
I'm mostly curious about how the shifted DEM looks like with this new apply using grid(), and how well the memory usage performs! 😊

rhugonnet · 2025-04-29T13:30:18Z

Great! 😄
@adebardo Would be interesting to see if we still have the edge effects with the new apply method by updating the plot here: https://xdem.readthedocs.io/en/stable/coregistration.html#the-blockwisecoreg-object).
I see it's not there in the latest doc that updated after merging this PR: https://xdem.readthedocs.io/en/latest/coregistration.html#the-blockwisecoreg-object

rhugonnet · 2025-04-29T13:34:23Z

There are some rendering issues with the changes in the doc too, we'll have to fix that in another PR.

adebardo force-pushed the 698-scaledblockwise branch from ec5e5da to 6ce35f3 Compare April 3, 2025 15:44

adebardo force-pushed the 698-scaledblockwise branch 2 times, most recently from 602ce15 to 08ddde3 Compare April 16, 2025 13:23