Skip to content

Timeout when reading data from Virtualizarr over HTTPS #2174

@rossawslater

Description

@rossawslater

I'm trying to pull a time series from the below NetCDF file stored on a thredds server, via https. I have parsed the file using Virtualizarr and written it to Icechunk. Data are spatially chunked and I can easily load a single chunk for one time step, or a few hundred time steps for a point, but requesting anything over ~ 2000 time steps (8760 total) will give a time out error (see below).

The file is available at: https://cordex.dmi.dk/thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116/tas_ANT-12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc

Script:

import xarray as xr
import icechunk as ic

from obstore.store import from_url
from obspec_utils.registry import ObjectStoreRegistry

from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser

base_url = "https://cordex.dmi.dk"
path = "thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116"
file = "tas_ANT-12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc"
url = f"{base_url}/{path}/{file}"
print(f"URL: {url}")

store = from_url(base_url)
registry = ObjectStoreRegistry({base_url: store})
parser = HDFParser()

vds = open_virtual_dataset(
    url=f"{url}",
    parser=parser,
    registry=registry,
)


storage = ic.local_filesystem_storage(
    path="./test_racmo_tas_hourly.icechunk",
)

config = ic.config.RepositoryConfig.default()

config.set_virtual_chunk_container(
    ic.virtual.VirtualChunkContainer(f"{base_url}/{path}/", ic.storage.http_store())
)

credentials = ic.credentials.containers_credentials({f"{base_url}/{path}/": None})

repo = ic.Repository.create(storage, config)

session = repo.writable_session("main")
vds.vz.to_icechunk(session.store)
session.commit("Initial commit.")


repo = ic.Repository.open(storage, config, credentials)
session = repo.readonly_session("main")

ds = xr.open_zarr(session.store)
ds.tas.sel(rlat=-10, rlon=170, method="nearest").load()

Timeout error:

IcechunkError:   x error fetching virtual reference
  | 
  | context:
  |    0: icechunk::store::get
  |            with key="tas[/c/918/0/0](https://notebooks.jasmin.ac.uk/c/918/0/0)" byte_range=From(0)
  |              at icechunk[/src/store.rs:183](https://notebooks.jasmin.ac.uk/src/store.rs#line=182)
  | 
  |-> error fetching virtual reference
  |-> Generic HTTP error: Error performing GET https://cordex.dmi.dk/thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116/tas_ANT-
  |   12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc in 30.00062629s - HTTP error: error sending request
  |-> Error performing GET https://cordex.dmi.dk/thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116/tas_ANT-
  |   12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc in 30.00062629s - HTTP error: error sending request
  |-> HTTP error: error sending request
  |-> HTTP error: error sending request
  |-> error sending request
  `-> operation timed out

I've tried adjusting the timeout in Icechunk via

config.set_virtual_chunk_container(
        ic.virtual.VirtualChunkContainer(f"{base_url}/{path}/", 
        ic.storage.http_store(opts={'timeout':'300s'})))

Which then raises a different error:

IcechunkError:   x error fetching virtual reference
  | 
  | context:
  |    0: icechunk::store::get
  |            with key="tas/c/995/0/0" byte_range=From(0)
  |              at icechunk/src/store.rs:183
  | 
  |-> error fetching virtual reference
  |-> Generic HTTP error: Error performing GET https://cordex.dmi.dk/thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116/tas_ANT-
  |   12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc in 60.120950817s - Server returned non-2xx status code: 504 Gateway Timeout: <html>
  |   <head><title>504 Gateway Time-out</title></head>
  |   <body>
  |   <center><h1>504 Gateway Time-out</h1></center>
  |   <hr><center>nginx</center>
  |   </body>
  |   </html>
  |   
  |-> Error performing GET https://cordex.dmi.dk/thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116/tas_ANT-
  |   12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc in 60.120950817s - Server returned non-2xx status code: 504 Gateway Timeout: <html>
  |   <head><title>504 Gateway Time-out</title></head>
  |   <body>
  |   <center><h1>504 Gateway Time-out</h1></center>
  |   <hr><center>nginx</center>
  |   </body>
  |   </html>
  |   
  `-> Server returned non-2xx status code: 504 Gateway Timeout: <html>
      <head><title>504 Gateway Time-out</title></head>
      <body>
      <center><h1>504 Gateway Time-out</h1></center>
      <hr><center>nginx</center>
      </body>
      </html>

I had a bit of a play with the Icechunk concurrency (via config.storage.concurrency) to see if that would help, but it didn't seem to have any impact.

It would be great to know if there are any strategies which can help with this, such as splitting / staggering requests, or alternatively if I've likely reached a sever-side limitation.

Environment

python 3.14.3
xarray v2026.2.0
obstore v0.9.1
obspec_utils v0.9.0
virtulizarr v2.4.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions