Replies: 2 comments 3 replies
-
|
Heads up that there's now a MVP (minimal viable pipeline) from STAC API queries to a graph LR
subgraph STAC DataPipeLine
A["IterableWrapper (list[dict])"] --> B
B["PySTACAPISearcher (list[pystac_client.ItemSearch])"] --> C
C["Mapper (list[pystac.ItemCollection])"] --> D
D["StackstacStacker (list[xarray.DataArray])"]
end
where the steps are:
Hoping to finish this by the end of the week 🤞, and will cut a new v0.5.0 release soon after 😁 |
Beta Was this translation helpful? Give feedback.
-
|
Note that zen3geo v0.6.0 comes with an XpySTACAssetReader DataPipe for reading STAC assets backed by COG/NetCDF/Zarr files, done in #87. This is essentially a wrapper around |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
To enable cloud-native, streaming machine learning data pipelines based on SpatioTemporal Asset Catalogs (STAC)!
A torch DataPipe is a way of doing composition over inheritance. The philosophy is to have each
DataPipedo one thing and do it well similar to the UNIX philosophy of pipe-ing one piece of text to another command. The pipe syntax also has parallels with the method chaining way ofpandas(seepandas.DataFrame.pipe).📖 STAC Readers
There are 4 parts as per https://stacspec.org/en/about/stac-spec, and one idea to have individual DataPipes for each of the STAC Item/Catalog/Collection/API as hinted in torchgeo/torchgeo#412 (comment)
PySTACItemReaderwrappingpystac.Item.from_file(✨ PySTACItemReaderIterDataPipe for reading STAC Items #46)PySTACCatalogReaderwrappingpystac.Catalog.from_filefor static catalogs.PySTACCollectionReaderwrappingpystac.ItemCollection.from_filePySTACAPISearcherwrapping e.g.pystac_client.Client.searchfor dynamic catalogs (✨ PySTACAPISearchIterDataPipe to query dynamic STAC Catalogs #59)See also https://stacspec.org/en/about/stac-spec/
💾 STAC I/O
Coming from the STAC Readers, the STAC objects (Item, ItemCollection, etc) would then need to be read into memory using some I/O library. These I/O libraries would handle the stacking of Assets as mentioned in torchgeo/torchgeo#412 (comment). E.g.
StackstacStackerwrappingstackstac.stackwhich returns anxarray.DataArray(✨ StackSTACStackerIterDataPipe for stacking STAC items #61)ODCstacLoaderwrappingodc.stac.loadwhich returns anxarray.DatasetNote: See also opendatacube/odc-stac#54 (comment) for differences between
stackstacandodc-stac🐕🦺 STAC services (requiring authentication)
planetary_computerhas their STAC catalog at https://planetarycomputer.microsoft.com/api/stac/v1/, and there are some (but not all) Collections which require signing/authenticationradiant-mlhubhas their own STAC catalog/API library, as mentioned in Add STACAPI dataset torchgeo/torchgeo#412 (comment)Note: The authentication/signing can be handled via the
parametersand/ormodifierparameters inpystac_client.Client.open(I think).🥤 Example 'DataPipeLine'
graph LR subgraph STAC DataPipeLine 1 A["IterableWrapper (list[url])"] --> B B["PySTACItemReader (list[pystac.Item])"] --> C C["StackstacStacker (xarray.DataArray)"] end🧑🤝🧑 Open for contributions
Anyone is welcome to comment on the details (e.g. naming the DataPipes, what else is needed, etc), or open a Pull Request directly to implement a DataPipe (see https://zen3geo.readthedocs.io/en/latest/CONTRIBUTING.html#running-things-locally on getting started)!
One thing to note is that I've designed
zen3geoexplicitly so that dependencies are optional by default, so if someone doesn't useodc-stacfor example, they shouldn't have to install it. Just bear this in mind when you're writing up the code.Cc @jamesvrt, @rbavery, @KennSmithDS
Originally discussed in torchgeo/torchgeo#412, xref torchgeo/torchgeo#576
Beta Was this translation helpful? Give feedback.
All reactions