.zarr.zip on s3 #1613
Replies: 4 comments 4 replies
-
This is great @jeffpeck10x! Would you be willing to add this to the tutorial? I wonder whether the built-in ZipStore has any use for reading from S3. Or is fsspec's ZipFileSystem always required here? |
Beta Was this translation helpful? Give feedback.
-
An additional option would be to use
With thanks to @caviere (see https://github.com/zarr-developers/outreachy_2022_testing_zipstore/blob/main/real%20%20world%20data/main.py#L54) Regardless, 👍 for your version and/or this one being in the tutorial location that would have helped you find them! Thanks. |
Beta Was this translation helpful? Give feedback.
-
made a small pr to update docs: #1615 |
Beta Was this translation helpful? Give feedback.
-
With Zarr v3, I used the following code to create a read-only Zarr store from a zipped Zarr in S3: import zarr
import s3fs
class S3ZipStore(zarr.storage.ZipStore):
def __init__(self, path: s3fs.S3File) -> None:
super().__init__(path="", mode="r")
self.path = path
s3 = s3fs.S3FileSystem(anon=True, endpoint_url=S3_ENDPOINT, asynchronous=False)
file = s3.open(f"s3://{S3_BUCKET}/{ZIP_PATH}")
zarr_store = S3ZipStore(file) It relies on that ZipStore only checks that Edit: A load time benchmark shows that ZipStore created from a local zip file suffers a similar performance hit compared to LocalStore: ---
config:
xyChart:
width: 900
height: 300
themeVariables:
xyChart:
backgroundColor: "#000"
titleColor: "#fff"
xAxisLabelColor: "#fff"
xAxisTitleColor: "#fff"
xAxisTickColor: "#fff"
xAxisLineColor: "#fff"
yAxisLabelColor: "#fff"
yAxisTitleColor: "#fff"
yAxisTickColor: "#fff"
yAxisLineColor: "#fff"
plotColorPalette: "#fff8, #000"
---
xychart-beta
title "Random Sentinel 2 patch time series load time benchmark (5100 m x 5100 m, 1 year)"
x-axis ["S3 Zarr", "S3 zipped Zarr", "NVMe Zarr", "NVMe zipped Zarr"]
y-axis "Mean load time (s)" 0 --> 26
bar [7.53, 23.9, 1.12, 3.21]
bar [0, 0, 0, 0, 0, 0, 0, 0, 0]
|
Beta Was this translation helpful? Give feedback.
-
I searched and could not find an example of accessing a
.zarr.zip
from an s3 endpoint without having to first download it entirely. The providedzarr.storage.ZipStore
only works on a local path (right?).I experimented and found that this works:
I am wondering if this is alright. Is there anything that could be improved with this approach?
My use-case is read-only. I understand that this approach would not be able to handle updates without updating the entire
.zarr.zip
.And more generally, if someone else is searching for a solution like this, I hope this helps!
Beta Was this translation helpful? Give feedback.
All reactions