Skip to content

Commit 8140021

Browse files
authored
Merge pull request #12 from xcube-dev/konstntokas-xxx-large_datarequest
Support large dataset requests for stac and cds
2 parents b7d8c70 + 2b5fbd7 commit 8140021

24 files changed

+479
-99
lines changed

CHANGES.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Changes in 0.3.0 (in development)
1+
## Changes in 0.3.0
22

33
* **Unified spatial resampling**: Switched to the new
44
`xcube_resampling.resample_in_space` library, replacing
@@ -14,6 +14,9 @@
1414
all requested geolocations of the datacubes to be generated.
1515
* Fixed visualization of preload handle
1616
* Experimental configuration setup GUI added using panel.
17+
* Implemented a new access handler that supports large dataset requests by
18+
automatically splitting and retrieving data from the `"cds"`, `"stac-cdse-ardc"`,
19+
and `"stac-pc-ardc"` data stores.
1720

1821

1922
## Changes in 0.2.0

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,4 @@
1616
xcube Multi-Source Data Store: Seamlessly Integrating and Harmonizing Data from
1717
Multiple Sources.
1818

19-
Find out more in the [xcube Multi-Source Data Store Documentation](https://xcube-dev.github.io/xcube-multistore/).
19+
Find out more in the [xcube Multi-Source Data Store Documentation](https://xcube-dev.github.io/xcube-multistore/).

docs/config.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ This configures a dataset object from a single data source.
5454
* **[data_id](#data_id)**
5555
* **[open_params](#open_params)**
5656
* **[format_id](#format_id)**
57+
* **[chunksize](#chunksize)**
5758
* **[custom_processing](#custom_processing)**
5859
* **[spatial_resample_params](#spatial_resample_params)**
5960
* **[temporal_resample_params](#temporal_resample_params)**
@@ -69,6 +70,7 @@ opened individually, and once all datasets are loaded, they are merged using
6970
* **[identifier](#identifier)**: This identifier defines the name of the final data cube.
7071
* **[grid_mapping](#grid_mapping)**
7172
* **[format_id](#format_id)**
73+
* **[chunksize](#chunksize)**
7274
* **[xr_merge_params](#xr_merge_params)**
7375
* **variables**: List of [variable object](#variable-object) objects.
7476

@@ -152,7 +154,7 @@ Identifier that assigns a grid mapping to the final dataset for reprojection.
152154
Unique identifier for the dataset's data source within the assigned data store.
153155

154156
### open_params
155-
Open data parameters related to the data store and data_id
157+
Open data parameters related to the data store and data_id.
156158

157159
### xr_merge_params
158160
Parameters of `xarry.merge` needed if harmonization of multiple datasets into one
@@ -165,6 +167,9 @@ Desired format of the saved datacube.
165167
**Default:** `zarr`
166168
**Allowed values:** `netcdf`, `zarr`, [`levels`](https://xcube.readthedocs.io/en/latest/mldatasets.html#the-xcube-levels-format)
167169

170+
### chunksize
171+
Optional mapping from dimension to chunk size.
172+
168173
### spatial_resample_params
169174
This section enables user to define the parameters for spatial resampling of
170175
the dataset. For a full list of supported parameters, refer to the

docs/index.md

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -62,13 +62,34 @@ For further examples please view the [examples folder](https://github.com/xcube-
6262
* support preload API for [xcube-clms](https://github.com/xcube-dev/xcube-clms) and
6363
[xcube-zendoo](https://github.com/xcube-dev/xcube-zenodo)
6464
* allow to write to netcdf and zarr
65+
* some auxiliary functionalities which shall help to setup a config YAML file.
66+
* interpolate along the time axis
6567

66-
> The following features will be implemented in the future:
68+
### Configuration Generator GUI
6769

68-
* some auxiliary functionalities which shall help to setup a config YAML file.
69-
* interpolate along the time axis
70+
The **Configuration Generator GUI** provides an interactive interface for creating and
71+
editing the configuration YAML, making the setup process more intuitive and less
72+
error-prone.
7073

71-
### License
74+
**Key features (in development):**
75+
76+
- Display of all available fields for each configuration section
77+
- Dynamic fetching and updating of valid parameters and inputs
78+
- Dropdown menus that show only supported options
79+
- Autofill assistance for large option sets (e.g., thousands of data IDs)
80+
- Built-in configuration validator/checker
81+
- Geolocation visualization to help define bounding boxes
82+
83+
> **Note:** This feature is under active development, and only a minimal working
84+
> example is currently available.
85+
86+
To launch the GUI, run the following command from the package root:
87+
88+
```bash
89+
panel serve xcube_multistore/gui/app.py --dev
90+
```
91+
92+
## License
7293

7394
The package is open source and released under the
7495
[MIT license](https://opensource.org/license/mit). :heart:

environment.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,13 @@ dependencies:
1313
- tabulate
1414
- xarray
1515
- xcube >=1.9.0
16-
- xcube-resampling
16+
- xcube-resampling >=0.2.3
1717
- yaml
1818
# data store plugins
1919
- xcube-cci >=0.11.2
2020
- xcube-cds >=1.0.0
2121
- xcube-clms >=0.2.2
22-
- xcube-stac >=1.1.0
22+
- xcube-stac >=1.1.2
2323
- xcube-zenodo >=1.0.0
2424
# Development Dependencies - Tools
2525
- black

test/accessors/test_cds.py

Lines changed: 46 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -20,40 +20,66 @@
2020
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2121
# SOFTWARE.
2222

23-
from xcube.core.store import new_data_store
24-
from xcube.util.jsonschema import JsonObjectSchema,JsonNumberSchema
23+
import unittest
24+
from unittest.mock import MagicMock
25+
2526
import xarray as xr
26-
from unittest.mock import patch
27+
from xcube.core.store import DataStoreError
2728

2829
from xcube_multistore.accessors.cds import CdsAccessor
29-
from ..sample_data import get_sample_data_3d
3030

31-
import unittest
31+
from ..sample_data import get_sample_data_3d
3232

3333

3434
class CdsAccessorTest(unittest.TestCase):
3535

3636
def setUp(self):
37-
ds_3d = get_sample_data_3d()
38-
memory_store = new_data_store("memory", root="datasource")
39-
memory_store.write_data(ds_3d, "era5_dataset3.zarr", replace=True)
40-
self.accesor = CdsAccessor(memory_store)
37+
self.ds_3d = get_sample_data_3d()
38+
self.accessor = CdsAccessor(MagicMock(), MagicMock(), "era5-land", MagicMock())
4139

4240
def test_open_data(self):
43-
ds = self.accesor.open_data("era5_dataset3.zarr")
44-
self.assertIsInstance(ds, xr.Dataset)
45-
self.assertCountEqual(["band_1"], ds.data_vars)
46-
self.assertEqual([10, 3, 3], [ds.sizes["time"], ds.sizes["lat"], ds.sizes["lon"]])
41+
self.accessor.store.open_data.return_value = self.ds_3d
42+
ds = self.accessor.open_data(
43+
"era5-land", time_range=("2025-01-01", "2025-01-10")
44+
)
45+
xr.testing.assert_equal(self.ds_3d, ds)
46+
self.accessor.store.open_data.assert_called_once()
4747

48-
@patch("xcube.core.store.fs.store.BaseFsDataStore.open_data")
49-
@patch("xcube.core.store.fs.store.BaseFsDataStore.get_open_data_params_schema")
50-
def test_open_data_spatial_res(self, mock_open_params_schema, mock_open_data):
51-
mock_open_params_schema.return_value = JsonObjectSchema(
52-
properties=dict(spatial_res=JsonNumberSchema(minimum=0.1 ))
48+
def test_open_data_spatial_res(self):
49+
self.accessor.store.open_data.return_value = self.ds_3d
50+
ds = self.accessor.open_data(
51+
"era5-land",
52+
time_range=("2025-01-01", "2025-01-10"),
53+
point=(5.0, 40.0),
5354
)
54-
mock_open_data.return_value = get_sample_data_3d()
55-
ds = self.accesor.open_data("era5_dataset3.zarr", point=(5.0, 40.0))
5655
self.assertIsInstance(ds, xr.Dataset)
5756
self.assertCountEqual(["band_1"], ds.data_vars)
5857
self.assertCountEqual(("time",), ds.dims)
5958
self.assertEqual([10], [ds.sizes["time"]])
59+
60+
def test_open_data_splits(self):
61+
# Fail first call, succeed sequentially on split halves
62+
def side_effect(*args, **kwargs):
63+
time_range = kwargs.get("time_range")
64+
if time_range == ("2025-01-01", "2025-01-10"):
65+
raise Exception("Too large request")
66+
if time_range == ("2025-01-01", "2025-01-05"):
67+
return self.ds_3d.isel(time=slice(0, 5))
68+
if time_range == ("2025-01-06", "2025-01-10"):
69+
return self.ds_3d.isel(time=slice(5, 10))
70+
raise AssertionError("Unexpected time_range")
71+
72+
self.accessor.store.open_data.side_effect = side_effect
73+
ds = self.accessor.open_data(
74+
"era5-land", time_range=("2025-01-01", "2025-01-10")
75+
)
76+
xr.testing.assert_equal(self.ds_3d, ds)
77+
78+
def test_open_with_split_base_case_error(self):
79+
self.accessor.store.open_data.side_effect = Exception("fail always")
80+
81+
with self.assertRaises(DataStoreError) as cm:
82+
_ = self.accessor.open_data(
83+
"era5-land", time_range=("2025-01-01", "2025-01-10")
84+
)
85+
self.assertIn("Cannot further split time range", str(cm.exception))

test/accessors/test_clms.py

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,15 +20,15 @@
2020
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2121
# SOFTWARE.
2222

23-
from xcube.core.store import new_data_store, PreloadedDataStore
24-
from xcube.util.jsonschema import JsonObjectSchema,JsonNumberSchema
23+
import unittest
24+
from unittest.mock import MagicMock
25+
2526
import xarray as xr
26-
from unittest.mock import patch
27+
from xcube.core.store import new_data_store
2728

2829
from xcube_multistore.accessors.clms import ClmsAccessor
29-
from ..sample_data import get_sample_data_3d
3030

31-
import unittest
31+
from ..sample_data import get_sample_data_3d
3232

3333

3434
class ClmsAccessorTest(unittest.TestCase):
@@ -37,11 +37,16 @@ def setUp(self):
3737
self.ds_3d = get_sample_data_3d()
3838
memory_store = new_data_store("memory", root="datasource")
3939
memory_store.cache_store = new_data_store("memory", root="cache_datadource")
40-
memory_store.cache_store.write_data(self.ds_3d, "clms_storage|clms_dataset.zarr", replace=True)
41-
self.accesor = ClmsAccessor(memory_store)
40+
memory_store.cache_store.write_data(
41+
self.ds_3d, "clms_storage|clms_dataset.zarr", replace=True
42+
)
43+
storage_store = new_data_store("file", root="data")
44+
self.accesor = ClmsAccessor(memory_store, storage_store, "test", MagicMock())
4245

4346
def test_open_data_cache_store(self):
4447
ds = self.accesor.open_data("clms_storage|clms_dataset.zarr")
4548
self.assertIsInstance(ds, xr.Dataset)
4649
self.assertCountEqual(["band_1"], ds.data_vars)
47-
self.assertEqual([10, 3, 3], [ds.sizes["time"], ds.sizes["lat"], ds.sizes["lon"]])
50+
self.assertEqual(
51+
[10, 3, 3], [ds.sizes["time"], ds.sizes["lat"], ds.sizes["lon"]]
52+
)

test/accessors/test_stac.py

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# MIT License
2+
#
3+
# Copyright (c) 2025 Brockmann Consult GmbH
4+
#
5+
# Permission is hereby granted, free of charge, to any person obtaining a copy
6+
# of this software and associated documentation files (the "Software"), to deal
7+
# in the Software without restriction, including without limitation the rights
8+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
# copies of the Software, and to permit persons to whom the Software is
10+
# furnished to do so, subject to the following conditions:
11+
#
12+
# The above copyright notice and this permission notice shall be included in all
13+
# copies or substantial portions of the Software.
14+
#
15+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
# FITNESS FOR A PARTICULAR PURPOSE AND NON INFRINGEMENT. IN NO EVENT SHALL THE
18+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
# SOFTWARE.
22+
23+
import unittest
24+
from unittest.mock import MagicMock
25+
26+
import xarray as xr
27+
from xcube.core.store import new_data_store
28+
29+
from xcube_multistore.accessors.stac import StacAccessor
30+
31+
from ..sample_data import get_sample_data_2d
32+
33+
34+
class StacAccessorTest(unittest.TestCase):
35+
36+
def setUp(self):
37+
self.ds_2d = get_sample_data_2d()
38+
self.storage = new_data_store("memory", root="data")
39+
self.accessor = StacAccessor(MagicMock(), self.storage, "test", MagicMock())
40+
41+
def test_open_data(self):
42+
self.accessor.store.open_data.return_value = self.ds_2d
43+
ds = self.accessor.open_data(
44+
"sentinel-2-l2a",
45+
bbox=[9, 54, 11, 56],
46+
time_range=("2025-01-01", "2025-01-31"),
47+
spatial_res=0.0001,
48+
)
49+
self.assertIsInstance(ds, xr.Dataset)
50+
self.assertCountEqual(["band_1"], ds.data_vars)
51+
self.assertCountEqual(("time", "lat", "lon"), ds.dims)
52+
self.assertEqual(
53+
[9, 9, 9], [ds.sizes["time"], ds.sizes["lat"], ds.sizes["lon"]]
54+
)
55+
56+
def test_open_data_point_request(self):
57+
self.accessor.store.open_data.return_value = self.ds_2d
58+
ds = self.accessor.open_data(
59+
"sentinel-2-l2a",
60+
point=(10, 55),
61+
bbox_width=4000,
62+
time_range=("2020-01-01", "2020-12-31"),
63+
spatial_res=10,
64+
asset_names=["B02"],
65+
)
66+
self.assertIsInstance(ds, xr.Dataset)
67+
self.assertCountEqual(["band_1"], ds.data_vars)
68+
self.assertCountEqual(("time", "lat", "lon"), ds.dims)
69+
self.assertEqual(
70+
[3, 9, 9], [ds.sizes["time"], ds.sizes["lat"], ds.sizes["lon"]]
71+
)

test/accessors/test_zenodo.py

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,15 @@
2020
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2121
# SOFTWARE.
2222

23-
from xcube.core.store import new_data_store
23+
import unittest
24+
from unittest.mock import MagicMock
25+
2426
import xarray as xr
27+
from xcube.core.store import new_data_store
2528

2629
from xcube_multistore.accessors.zenodo import ZenodoAccessor
27-
from ..sample_data import get_sample_data_3d
2830

29-
import unittest
31+
from ..sample_data import get_sample_data_3d
3032

3133

3234
class ZenodoAccessorTest(unittest.TestCase):
@@ -36,17 +38,24 @@ def setUp(self):
3638
memory_store = new_data_store("memory", root="datasource")
3739
memory_store.write_data(self.ds_3d, "dataset.zarr", replace=True)
3840
memory_store.cache_store = new_data_store("memory", root="cache_datadource")
39-
memory_store.cache_store.write_data(self.ds_3d, "zenodo_cache/dataset.zarr", replace=True)
40-
self.accesor = ZenodoAccessor(memory_store)
41+
memory_store.cache_store.write_data(
42+
self.ds_3d, "zenodo_cache/dataset.zarr", replace=True
43+
)
44+
storage_store = new_data_store("file", root="data")
45+
self.accesor = ZenodoAccessor(memory_store, storage_store, "test", MagicMock())
4146

4247
def test_open_data(self):
4348
ds = self.accesor.open_data("dataset.zarr")
4449
self.assertIsInstance(ds, xr.Dataset)
4550
self.assertCountEqual(["band_1"], ds.data_vars)
46-
self.assertEqual([10, 3, 3], [ds.sizes["time"], ds.sizes["lat"], ds.sizes["lon"]])
51+
self.assertEqual(
52+
[10, 3, 3], [ds.sizes["time"], ds.sizes["lat"], ds.sizes["lon"]]
53+
)
4754

4855
def test_open_data_cache_store(self):
4956
ds = self.accesor.open_data("zenodo_cache/dataset.zarr")
5057
self.assertIsInstance(ds, xr.Dataset)
5158
self.assertCountEqual(["band_1"], ds.data_vars)
52-
self.assertEqual([10, 3, 3], [ds.sizes["time"], ds.sizes["lat"], ds.sizes["lon"]])
59+
self.assertEqual(
60+
[10, 3, 3], [ds.sizes["time"], ds.sizes["lat"], ds.sizes["lon"]]
61+
)

test/test_config.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@
2020
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2121
# SOFTWARE.
2222

23-
import sys
2423
import unittest
2524
from unittest.mock import patch
2625

@@ -123,6 +122,5 @@ def test_is_jupyter_notebook(self):
123122
self.assertFalse(_is_jupyter_notebook())
124123

125124
with patch("IPython.get_ipython") as mock_get_ipython:
126-
mock_get_ipython.return_value = object() # anything non-None
125+
mock_get_ipython.return_value = object() # anything non-None
127126
self.assertTrue(_is_jupyter_notebook())
128-

0 commit comments

Comments
 (0)