Skip to content

Conversation

@dbrakenhoff
Copy link
Contributor

@dbrakenhoff dbrakenhoff commented Nov 27, 2025

Major update of GridIntersect module.

@Reviewer(s), I have a few things I wonder about, especially the naming of things:

  • A new method called points_to_cellids() for locating points in the grid. Is this name clear? And is cellid the correct name for (row, column) or (cellid,) output?
  • I use "cellids" as a field name in the result but when return_nodenumbers=True, the intersects result can also return node numbers instead of (row, columns) for structured grids.
  • Anyway, a critical look at the naming of stuff in the module would be greatly appreciated!

Support for numpy arrays of shapely geometries

Introduces support for arrays of shapely geometries in GridIntersect.intersects() method.

gi = GridIntersect(grid)
shapes = shapely.polygons(...)
gi.intersects(shapes)

Returns rec-array with a shp_id field containing the index of the original shape in shapes and the cellids of the candidate cells the polygons intersect with, for example:

rec.array([(0, (np.int64(1), np.int64(0))),
           (0, (np.int64(1), np.int64(1))),
           (1, (np.int64(0), np.int64(0))),
           (1, (np.int64(0), np.int64(1)))],
          dtype=[('shp_ids', '<i8'), ('cellids', 'O')])

So in that example output polygon 0 intersects with 2 grid cells: (1, 0) and (1, 1), and polygon 1 intersects with 2 grid cells: (0, 0), and (0, 1).

Fast point locating

GridIntersect was missing a method to obtain one cellid for each input point. The implementation of GridIntersect.intersects() drops all points outside the model grid, and potentially returns all intersection results for each point, meaning there is no 1:1 output between input points and output cellids. The GridIntersect.intersect() method allows a MultiPoint as input, but is inefficient, and groups points in the same cell as MultiPoints, also losing the 1:1 relation.

So a new method is introduced GridIntersect.points_to_cellids(). This method guarantees len(input) == len(output). Cellids are returned for each point and NaN is returned if the point is outside the model grid. For points on boundaries the lowest grid cellid is returned.

Support for points with z-coordinates

Adds support for points with z-coordinates in GridIntersect.intersect() and GridIntersect.points_to_cellids() with the handle_z-kwarg.

  • The default is "ignore" which ignores the z-coordinates, similar to what GridIntersect did before.
  • The second option is "drop" which removes the intersection results if the point lies above or below the model grid. This option is only included in intersect().
  • The final option is "return" which returns the computed layer position for each point. This returns NaN if the point is outside the model domain, +inf if the point is above the model and -inf the point is below the model.

Deprecations

Deprecates everything related to method="structured"

Other stuff

  • Cleaned up the module somewhat with some renaming since the structured methods dropped out
  • Used shapely plotting routines for linestrings and points to simplify plotting calls, make static plot methods accept shapely geometries instead of rec-array results so they can also be used to plot input geometries.
  • Update tests to test new features and clean up old tests
  • Update example notebook.

- deprecate method kwarg
- remove keepzerolengths from intersect()
- clean up deprecated methods
…ts()

- allow arrays of numpy arrays in intersects(), returns all candidate cells for intersection per geometry
- add predicate to query_grid()
- simplify names ofintersection methods, deprecate old internal methods
- move logic to translate node number to rowcol to separate method
- flip arguments in filter_query_result to be consistent with other class methods
- remove get_gridshapes method()
- move imports to top of file
- add points_to_cellids() for fast 1:1 mapping between points and grid cells, return nan when outside of grid
- rename return_cellids to return_nodenumbers for clarity (node numbers always one number, cellid is somewhat ambiguous)
- use shapely plotting for linestrings and points
- add handle_z options: "ignore", "drop" and "return" to intersect() and points_to_cellids()
- add method to compute layer position for z coordinates
- improve _intersect_point() method by using intersects as starting point.
- remove tests for deprecated methods
- add array tests for intersects() and points_to_cellids()
- activate+improve tests for 3D points
- make staticmethods accept shapely geometries as input instead of rec-arrays
- update notebook
@dbrakenhoff
Copy link
Contributor Author

Tests failing because MF6 examples still use method kwarg. Getting tests to pass, will require a change in modflow6-examples repo:

MODFLOW-ORG/modflow6-examples#315

- replace 3 instances where new nodenumber to rowcol method should be used
@aacovski
Copy link
Contributor

aacovski commented Nov 27, 2025

@dbrakenhoff First off, apologies for overlapping with your work...

searchsorted (structured) and kdtree (vertex, unstructured) in my PR already provides 1:1 point-to-cell mapping - each query point returns exactly one cell (or nan if outside, can change from nan). We could add wrappers or extend grid.intersect to use this approach while maintaining the existing API... So I'm a bit confused by the timing of this PR.

The key difference from shapely: shapely handles non-convex cell geometries, whereas our approach assumes convex hulls. This assumption is valid for typical MODFLOW grids (rectangles, Voronoi cells, triangles) and enables significant simplifications and speedups. More importantly, the convex hull approach generalizes cleanly to higher-dimensional shapes - the same hyperplane equations can work for lines, polygons, and polyhedrons without needing shapely at all. I can work with you in the future once I'm done with the point to cell PR for higher-dimensional shapes.

@dbrakenhoff
Copy link
Contributor Author

@dbrakenhoff First off, apologies for overlapping with your work...

No worries, there were already 2 places in flopy with grid intersection operations, so our simultaneous work hasn't changed that, but they have become slightly more similar now that they both support arrays of points.

searchsorted (structured) and kdtree (vertex, unstructured) in my PR already provides 1:1 point-to-cell mapping - each query point returns exactly one cell (or nan if outside, can change from nan). We could add wrappers or extend grid.intersect to use this approach while maintaining the existing API... So I'm a bit confused by the timing of this PR.

I realize there is some overlap with your work, especially between the new points_to_cellids() method and the array-based grid.intersect() methods you built for grids. The inefficient method for intersecting points in GridIntersect has bugged me for quite some time, plus as you can see in the list of issues and the TODOs that were fixed in the code itself, there was a bunch of work we still wanted to do on the module, which is all included in this PR. Also I felt it was weird for GridIntersect to have good support for linestrings and polygons but not have a decent implementation for points. Yes I could have just used self.mfgrid.intersect() but since the whole GridIntersect module was written around shapely, I wanted to stay with shapely for now, and see what I could achieve with those tools. Shapely's approach might have some limitations, but it seems very fast, and very flexible for the most common intersection operations.

So the questions now are, do we continue to let the two live side-by-side? Or do we integrate them and implement grid.intersect() in GridIntersect for point intersection calls (I would be curious to see how the performance compares?) ? But this might also be more of a question for flopy4, where a significant redesign of flopy will take place. And in the meantime we can enjoy having two intersection methods :).

@wpbonelli
Copy link
Member

do we continue to let the two live side-by-side?

I think this is fine. 4.x is a natural time for consolidation. at some point we'll need to make the decision which to delegate to for xarray spatial queries e.g. stuff like model.grid.head.sel(x=250, y=650, z=50, method='nearest') right? but unless you guys want to try the integration sooner I agree with @dbrakenhoff that

we can enjoy having two intersection methods :)

Other comments

A new method called points_to_cellids() for locating points in the grid. Is this name clear?

Clear to me, I like it

And is cellid the correct name for (row, column) or (cellid,) output?

Yes, but re:

"cellids" as a field name in the result but when return_nodenumbers=True, the intersects result can also return node numbers instead of (row, columns) for structured grids

while I like the cellid/nn toggle, I wonder if it should return the elements of cellid as separate columns. While people may be used to seeing cell ID tuples as a single recarray column we have had a lot of internal discussion about how awkward it is and are thinking of always flattening them into separate columns in 4.x. as this parameter is new I'm tempted to say we are free to shed old baggage here

Relatedly, maybe singular instead of plural naming for the columns i.e. shp_ids -> shp_id etc?

Adds support for points with z-coordinates in GridIntersect.intersect() and GridIntersect.points_to_cellids() with the handle_z-kwarg. The default is "ignore" which ignores the z-coordinates, similar to what GridIntersect did before.

Just wondering if there is a motivation for "ignore" besides backwards-compatibility? I guess that is sufficient cause but it seems nicer if this could be a bool switch instead of a string. maybe something for 4.x

@wpbonelli
Copy link
Member

maybe also consider returning pd.DataFrame instead of np.recarray as pandas is now a core dependency?

@dbrakenhoff
Copy link
Contributor Author

Thanks for the comments @wpbonelli! I'll prepare some fixes based on your comments:

  • return structured result with additional "row" and "col" columns, vertex result with "cellid" column. I will continue to return the original result in "cellids" so old code doesn't break...
  • return pd.DataFrame instead of recarray. This is something of a breaking change since the return type will change, but since the syntax for accessing the columns is still sort of the same, i think it is okay to default to pandas DataFrames? So basically we flip the dataframe. Perhaps I can set the kwarg to None and give a DeprecationWarning and return a DataFrame, allow silencing through dataframe=True and old behavior through dataframe=False.
  • I can switch handle_z to be boolean and effectively only support "ignore" and "return". I added "drop" to automatically drop the NaNs for points above/below the model grid, but we can leave that to the user. Which should be easy when the return object is a pandas.DataFrame.
  • I will convert "shp_id" to singular naming, but the others were already plural, so changing them will break peoples scripts... So I guess those others: "cellids"--> "cellid", "ixshapes" --> "geometry" are something for 4.x?

And yea, it makes sense to defer the integration to 4.x. I think @aacovski's work more naturally fits with the xarray implementation, whereas GridIntersect() might continue as a utility with a vector based approach for linestrings and polygons. For point intersections the two are very similar, so it would make sense to maybe defer those operations to the new modelgrid object. If there are performance gains when computing grid cell intersection candidates using the new modelgrid object, we can use the grid to filter the candidates, and then run the intersection computations. But we'll figure that out when we get there :).

As for this code: model.grid.head.sel(x=250, y=650, z=50, method='nearest') is great for getting the nearest cell center, but doesn't guarantee you're selecting the cell the point (x,y) is in, e.g. in quadtree grids at the refinement boundaries, so that might be something to think about.

@wpbonelli
Copy link
Member

return pd.DataFrame instead of recarray. This is something of a breaking change since the return type will change, but since the syntax for accessing the columns is still sort of the same, i think it is okay to default to pandas DataFrames? So basically we flip the dataframe. Perhaps I can set the kwarg to None and give a DeprecationWarning and return a DataFrame, allow silencing through dataframe=True and old behavior through dataframe=False.

Yeah since dataframe syntax is basically a superset of recarray syntax I think this is fine. Sounds good to me though I will admit I am not a frequent user of these APIs so am not personally much affected.

I can switch handle_z to be boolean and effectively only support "ignore" and "return". I added "drop" to automatically drop the NaNs for points above/below the model grid, but we can leave that to the user. Which should be easy when the return object is a pandas.DataFrame.

If you think it will be a common case to want to drop NaNs I can see the case for the string, no strong feelings on this. But still a bool feels cleaner and it is easy for the caller to drop them.

I will convert "shp_id" to singular naming, but the others were already plural, so changing them will break peoples scripts... So I guess those others: "cellids"--> "cellid", "ixshapes" --> "geometry" are something for 4.x?

Ah, ok- yeah, then that's for 4.x.

As for this code: model.grid.head.sel(x=250, y=650, z=50, method='nearest') is great for getting the nearest cell center, but doesn't guarantee you're selecting the cell the point (x,y) is in, e.g. in quadtree grids at the refinement boundaries, so that might be something to think about.

Do you mean smack on the boundary? In that case could we just use the same lowest ID tie-breaking rule? Or am I misunderstanding- how is a refinement boundary different from any shared cell boundary in this regard?

@dbrakenhoff
Copy link
Contributor Author

Do you mean smack on the boundary? In that case could we just use the same lowest ID tie-breaking rule? Or am I misunderstanding- how is a refinement boundary different from any shared cell boundary in this regard?

I meant this situation. Not necessarily a problem but getting the nearest value and getting the exact cellid your point lies in can be different.

image

@wpbonelli
Copy link
Member

Ahh. thanks. nearest cell center. I was confusing that with some notion of "nearest cell" which I recognize now is ill-defined absent a projection scheme I guess..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

4 participants