-
Notifications
You must be signed in to change notification settings - Fork 443
GeoDataset: rtree -> geopandas #2747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
85cceca
to
d72cdd3
Compare
@@ -354,10 +301,10 @@ class RasterDataset(GeoDataset): | |||
date_format = '%Y%m%d' | |||
|
|||
#: Minimum timestamp if not in filename | |||
mint: float = 0 | |||
mint: datetime = pd.Timestamp.min |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pandas defaults to datetime64[ns]
, which has a much smaller range than datetime
:
>>> import pandas as pd
>>> pd.Timestamp.min
Timestamp('1677-09-21 00:12:43.145224193')
>>> pd.Timestamp.max
Timestamp('2262-04-11 23:47:16.854775807')
>>> datetime.min
datetime.datetime(1, 1, 1, 0, 0)
>>> datetime.max
datetime.datetime(9999, 12, 31, 23, 59, 59, 999999)
I can't imagine this being an issue, but something to be aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May actually convert the types directly to pd.Timestamp, datetime-compatibility is not great...
I mostly have the dataset base classes working now. However, the following datasets are either custom or hacky, making the porting a bit tricky. I would appreciate help from the people who wrote these datasets:
@isaaccorley maybe you can help with some of these as well. |
First question from me is about speed -- do we have a benchmark script to understand before/after lookup speed? |
We have both https://torchgeo.readthedocs.io/en/latest/user/contributing.html#i-o-benchmarking (for TorchGeo) and https://github.com/zhu-xlab/spatiotemporal-indexing and (for rtree/geopandas/etc.). |
It looks like this PR isn't at the point where I can run the torchgeo IO benchmarks (I switched to this branch, tried, and got |
That's the goal. There will likely be minor differences but we can correct those as we go. |
What is required here from my side? Checking if the dataset works after changes? |
@adrianboguszewski LandCover.ai Geo doesn't look too bad, but we need to update the |
@adamjstewart yeah, little busy, so would be great if you create a PR and I test it :) |
@adamjstewart, @adrianboguszewski, let me take up landcover.ai dataset to get me into shape in adapting dataset classes |
d2f40e2
to
73567c5
Compare
@calebrob6 @nilsleh @sfalkena @lccol @isaaccorley I would like to have this PR completed in the next couple days. If you're able to finish converting the remaining custom datasets to geopandas by then let me know. If not, I can start working on them, as everything else is now done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error in l7irish
fa1fd8e
to
78866ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never been so happy to see so many lines of code deleted
This PR migrates our GeoDataset indices from rtree to geopandas, and is part of ongoing work to add time series support to TorchGeo: #2382.
Progress
Backwards-Incompatible Changes
I've tried to keep backwards-incompatible changes in this particular PR to a minimum, but there were a few that were unavoidable or minor:
GeoDataset
now uses pyproj CRS instead of rasterio CRSGeoDataset
no longer provides a default__init__
methodBoundingBox
now takes datetime objects instead of floats for mint/maxtroi_split
now includes geometries like LineString and Point with zero-area overlapSome of these changes could be undone if necessary, but I expect to have many more backwards-incompatible changes in future PRs.
Open Questions
VectorDataset
now that we can store arbitrary Polygons in the index?IntervalIndex
,PeriodIndex
, andDatetimeIndex
?date
,datetime
,pd.Timestamp
,pd.Period
?Point
,Polygon
, etc.?GeoDataset
andNonGeoDataset
(andTileDataset
)?Motivation
Switching from rtree to geopandas is no small feat. This decision was made after careful consideration of several options and alternatives. The decision to switch from rtree to geopandas was made due to geopandas support for the following features that rtree does not support (items in bold are critical for time series integration):
index.dissolve
index.groupby
pandas.unique
index.to_crs
Geometry
s: includingPoint
s and non-rectangularPolygon
s