I would like to be able to read virtual references back out from an icechunk store into VirtualiZarr ManifestArray objects.
Note this issue is the icechunk equivalent of zarr-developers/VirtualiZarr#118, which is about reading kerchunk references into ManifestArray objects.
The main use case is appending new data to an existing store (see zarr-developers/VirtualiZarr#21 (comment)), so that when some new data arrives (e.g. a new grib file with today's weather data), I can add an updated snapshot just with something like:
import virtualizarr as vz
# avoids re-extracting all the metadata from all the past grib files, so should be quick
existing_vds = vz.open_virtual_dataset(icechunkstore, reader='icechunk')
new_vds = vz.open_virtual_dataset('todays_weather.grib', reader='grib')
updated_vds = xr.concat([existing_vds, new_vds], dim='time')
# commit new snapshot that includes today's data
# requires https://github.com/earth-mover/icechunk/issues/103
updated_vds.virtualize.to_icechunk(icechunkstore)
icechunkstore.commit('<todays-date>')
In order to implement that Icechunk reader for virtualizarr I would need some API for getting all virtual (and non-virtual) references for a snapshot back out of the Icechunk store, ideally as a vz.ManifestArray or something I can cheaply coerce to one (see ChunkManifest.from_arrays()).
Writing the updated references as a new snapshot also requires #103.
(I guess the .virtualize.to_icechunk method might also need to know to do array.resize in this example... (see the Append example in this notebook.)
Running that above snippet as a cron job / event-driven serverless function should go a long way towards making ingestion of regularly-updated data archives easier. (cc @mpiannucci)
This feature might also be useful to allow using icechunk as a serialization format during large serverless reductions (xref zarr-developers/VirtualiZarr#123).
cc @paraseba
I would like to be able to read virtual references back out from an icechunk store into VirtualiZarr
ManifestArrayobjects.Note this issue is the icechunk equivalent of zarr-developers/VirtualiZarr#118, which is about reading kerchunk references into
ManifestArrayobjects.The main use case is appending new data to an existing store (see zarr-developers/VirtualiZarr#21 (comment)), so that when some new data arrives (e.g. a new grib file with today's weather data), I can add an updated snapshot just with something like:
In order to implement that Icechunk reader for virtualizarr I would need some API for getting all virtual (and non-virtual) references for a snapshot back out of the Icechunk store, ideally as a
vz.ManifestArrayor something I can cheaply coerce to one (seeChunkManifest.from_arrays()).Writing the updated references as a new snapshot also requires #103.
(I guess the
.virtualize.to_icechunkmethod might also need to know to doarray.resizein this example... (see the Append example in this notebook.)Running that above snippet as a cron job / event-driven serverless function should go a long way towards making ingestion of regularly-updated data archives easier. (cc @mpiannucci)
This feature might also be useful to allow using icechunk as a serialization format during large serverless reductions (xref zarr-developers/VirtualiZarr#123).
cc @paraseba