More efficient way to encode "fetch this whole chunk"

Currently if we create a virtual reference that points at an entire object it is stored internally as

```python
(path=s3://bucket/file.whatever, offset=0, length=<full_length_of_object>)
```

where the `<full_length_of_object>` needs to be determined at parsing-time.

This leads to a big inefficiency in the `ZarrParser`, which currently does an `O(n_chunks)` iteration over all chunks in the Zarr array to discover the sizes of every chunk object. It would be great to be able to skip this iteration.

https://github.com/zarr-developers/VirtualiZarr/blob/c67dcc1b89f69d4ae59c92276a4ead5963c944ac/virtualizarr/parsers/zarr.py#L81

It is apparently possible to issue a HTTP range request for a whole object without specifying the length of the object (see [allowed range header syntax](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Range#syntax)). So if we instead e.g. stored the virtual reference as 

```python
(path=s3://bucket/file.whatever, offset=0, length=-1)
```

and our chunk-fetching IO implementations (i.e. `ManifestStore`/Icechunk/fsspec) knew what to do with this, then we could skip getting the object size, and see an O(n_chunks) speedup when virtualizing any (un-sharded) Zarr store.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More efficient way to encode "fetch this whole chunk" #850

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

More efficient way to encode "fetch this whole chunk" #850

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions