Timeout when reading data from Virtualizarr over HTTPS

I'm trying to pull a time series from the below NetCDF file stored on a thredds server, via https. I have parsed the file using Virtualizarr and written it to Icechunk. Data are spatially chunked and I can easily load a single chunk for one time step, or a few hundred time steps for a point, but requesting anything over ~ 2000 time steps (8760 total) will give a time out error (see below).

The file is available at: `https://cordex.dmi.dk/thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116/tas_ANT-12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc`

Script:
```python
import xarray as xr
import icechunk as ic

from obstore.store import from_url
from obspec_utils.registry import ObjectStoreRegistry

from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser

base_url = "https://cordex.dmi.dk"
path = "thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116"
file = "tas_ANT-12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc"
url = f"{base_url}/{path}/{file}"
print(f"URL: {url}")

store = from_url(base_url)
registry = ObjectStoreRegistry({base_url: store})
parser = HDFParser()

vds = open_virtual_dataset(
    url=f"{url}",
    parser=parser,
    registry=registry,
)


storage = ic.local_filesystem_storage(
    path="./test_racmo_tas_hourly.icechunk",
)

config = ic.config.RepositoryConfig.default()

config.set_virtual_chunk_container(
    ic.virtual.VirtualChunkContainer(f"{base_url}/{path}/", ic.storage.http_store())
)

credentials = ic.credentials.containers_credentials({f"{base_url}/{path}/": None})

repo = ic.Repository.create(storage, config)

session = repo.writable_session("main")
vds.vz.to_icechunk(session.store)
session.commit("Initial commit.")


repo = ic.Repository.open(storage, config, credentials)
session = repo.readonly_session("main")

ds = xr.open_zarr(session.store)
ds.tas.sel(rlat=-10, rlon=170, method="nearest").load()
```
Timeout error:
```
IcechunkError:   x error fetching virtual reference
  | 
  | context:
  |    0: icechunk::store::get
  |            with key="tas[/c/918/0/0](https://notebooks.jasmin.ac.uk/c/918/0/0)" byte_range=From(0)
  |              at icechunk[/src/store.rs:183](https://notebooks.jasmin.ac.uk/src/store.rs#line=182)
  | 
  |-> error fetching virtual reference
  |-> Generic HTTP error: Error performing GET https://cordex.dmi.dk/thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116/tas_ANT-
  |   12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc in 30.00062629s - HTTP error: error sending request
  |-> Error performing GET https://cordex.dmi.dk/thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116/tas_ANT-
  |   12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc in 30.00062629s - HTTP error: error sending request
  |-> HTTP error: error sending request
  |-> HTTP error: error sending request
  |-> error sending request
  `-> operation timed out
```

I've tried adjusting the timeout in Icechunk via

```python
config.set_virtual_chunk_container(
        ic.virtual.VirtualChunkContainer(f"{base_url}/{path}/", 
        ic.storage.http_store(opts={'timeout':'300s'})))
```

Which then raises a different error:

```
IcechunkError:   x error fetching virtual reference
  | 
  | context:
  |    0: icechunk::store::get
  |            with key="tas/c/995/0/0" byte_range=From(0)
  |              at icechunk/src/store.rs:183
  | 
  |-> error fetching virtual reference
  |-> Generic HTTP error: Error performing GET https://cordex.dmi.dk/thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116/tas_ANT-
  |   12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc in 60.120950817s - Server returned non-2xx status code: 504 Gateway Timeout: <html>
  |   <head><title>504 Gateway Time-out</title></head>
  |   <body>
  |   <center><h1>504 Gateway Time-out</h1></center>
  |   <hr><center>nginx</center>
  |   </body>
  |   </html>
  |   
  |-> Error performing GET https://cordex.dmi.dk/thredds/fileServer/esg_cordex/PolarRes/ANT-12/UU-IMAU/ERA5/evaluation/r1i1p1f1/RACMO24P-NN/v1-r1/1hr/tas/v20260116/tas_ANT-
  |   12_ERA5_evaluation_r1i1p1f1_UU-IMAU_RACMO24P-NN_v1-r1_1hr_197901010000-197912312300.nc in 60.120950817s - Server returned non-2xx status code: 504 Gateway Timeout: <html>
  |   <head><title>504 Gateway Time-out</title></head>
  |   <body>
  |   <center><h1>504 Gateway Time-out</h1></center>
  |   <hr><center>nginx</center>
  |   </body>
  |   </html>
  |   
  `-> Server returned non-2xx status code: 504 Gateway Timeout: <html>
      <head><title>504 Gateway Time-out</title></head>
      <body>
      <center><h1>504 Gateway Time-out</h1></center>
      <hr><center>nginx</center>
      </body>
      </html>
```

I had a bit of a play with the Icechunk concurrency (via `config.storage.concurrency`) to see if that would help, but it didn't seem to have any impact.

It would be great to know if there are any strategies which can help with this, such as splitting / staggering requests, or alternatively if I've likely reached a sever-side limitation.


_Environment_
```
python 3.14.3
xarray v2026.2.0
obstore v0.9.1
obspec_utils v0.9.0
virtulizarr v2.4.0
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout when reading data from Virtualizarr over HTTPS #2174

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Timeout when reading data from Virtualizarr over HTTPS #2174

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions