I'm trying to pull a time series from the below NetCDF file stored on a thredds server, via https. I have parsed the file using Virtualizarr and written it to Icechunk. Data are spatially chunked and I can easily load a single chunk for one time step, or a few hundred time steps for a point, but requesting anything over ~ 2000 time steps (8760 total) will give a time out error (see below).
The file is available at: https://cordex.dmi.dk/thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116/tas_ANT-12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc
Script:
import xarray as xr
import icechunk as ic
from obstore.store import from_url
from obspec_utils.registry import ObjectStoreRegistry
from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
base_url = "https://cordex.dmi.dk"
path = "thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116"
file = "tas_ANT-12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc"
url = f"{base_url}/{path}/{file}"
print(f"URL: {url}")
store = from_url(base_url)
registry = ObjectStoreRegistry({base_url: store})
parser = HDFParser()
vds = open_virtual_dataset(
url=f"{url}",
parser=parser,
registry=registry,
)
storage = ic.local_filesystem_storage(
path="./test_racmo_tas_hourly.icechunk",
)
config = ic.config.RepositoryConfig.default()
config.set_virtual_chunk_container(
ic.virtual.VirtualChunkContainer(f"{base_url}/{path}/", ic.storage.http_store())
)
credentials = ic.credentials.containers_credentials({f"{base_url}/{path}/": None})
repo = ic.Repository.create(storage, config)
session = repo.writable_session("main")
vds.vz.to_icechunk(session.store)
session.commit("Initial commit.")
repo = ic.Repository.open(storage, config, credentials)
session = repo.readonly_session("main")
ds = xr.open_zarr(session.store)
ds.tas.sel(rlat=-10, rlon=170, method="nearest").load()
Timeout error:
IcechunkError: x error fetching virtual reference
|
| context:
| 0: icechunk::store::get
| with key="tas[/c/918/0/0](https://notebooks.jasmin.ac.uk/c/918/0/0)" byte_range=From(0)
| at icechunk[/src/store.rs:183](https://notebooks.jasmin.ac.uk/src/store.rs#line=182)
|
|-> error fetching virtual reference
|-> Generic HTTP error: Error performing GET https://cordex.dmi.dk/thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116/tas_ANT-
| 12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc in 30.00062629s - HTTP error: error sending request
|-> Error performing GET https://cordex.dmi.dk/thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116/tas_ANT-
| 12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc in 30.00062629s - HTTP error: error sending request
|-> HTTP error: error sending request
|-> HTTP error: error sending request
|-> error sending request
`-> operation timed out
I've tried adjusting the timeout in Icechunk via
config.set_virtual_chunk_container(
ic.virtual.VirtualChunkContainer(f"{base_url}/{path}/",
ic.storage.http_store(opts={'timeout':'300s'})))
Which then raises a different error:
IcechunkError: x error fetching virtual reference
|
| context:
| 0: icechunk::store::get
| with key="tas/c/995/0/0" byte_range=From(0)
| at icechunk/src/store.rs:183
|
|-> error fetching virtual reference
|-> Generic HTTP error: Error performing GET https://cordex.dmi.dk/thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116/tas_ANT-
| 12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc in 60.120950817s - Server returned non-2xx status code: 504 Gateway Timeout: <html>
| <head><title>504 Gateway Time-out</title></head>
| <body>
| <center><h1>504 Gateway Time-out</h1></center>
| <hr><center>nginx</center>
| </body>
| </html>
|
|-> Error performing GET https://cordex.dmi.dk/thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116/tas_ANT-
| 12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc in 60.120950817s - Server returned non-2xx status code: 504 Gateway Timeout: <html>
| <head><title>504 Gateway Time-out</title></head>
| <body>
| <center><h1>504 Gateway Time-out</h1></center>
| <hr><center>nginx</center>
| </body>
| </html>
|
`-> Server returned non-2xx status code: 504 Gateway Timeout: <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx</center>
</body>
</html>
I had a bit of a play with the Icechunk concurrency (via config.storage.concurrency) to see if that would help, but it didn't seem to have any impact.
It would be great to know if there are any strategies which can help with this, such as splitting / staggering requests, or alternatively if I've likely reached a sever-side limitation.
Environment
python 3.14.3
xarray v2026.2.0
obstore v0.9.1
obspec_utils v0.9.0
virtulizarr v2.4.0
I'm trying to pull a time series from the below NetCDF file stored on a thredds server, via https. I have parsed the file using Virtualizarr and written it to Icechunk. Data are spatially chunked and I can easily load a single chunk for one time step, or a few hundred time steps for a point, but requesting anything over ~ 2000 time steps (8760 total) will give a time out error (see below).
The file is available at:
https://cordex.dmi.dk/thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116/tas_ANT-12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.ncScript:
Timeout error:
I've tried adjusting the timeout in Icechunk via
Which then raises a different error:
I had a bit of a play with the Icechunk concurrency (via
config.storage.concurrency) to see if that would help, but it didn't seem to have any impact.It would be great to know if there are any strategies which can help with this, such as splitting / staggering requests, or alternatively if I've likely reached a sever-side limitation.
Environment