Skip to content

Commit 6871f17

Browse files
committed
Merge main
2 parents aa2d399 + a2a52d2 commit 6871f17

File tree

20 files changed

+840
-355
lines changed

20 files changed

+840
-355
lines changed

docs/docs/icechunk-python/configuration.md

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -69,11 +69,11 @@ config.storage = icechunk.StorageSettings(
6969

7070
### [`virtual_chunk_containers`](./reference.md#icechunk.RepositoryConfig.virtual_chunk_containers)
7171

72-
Icechunk allows repos to contain [virtual chunks](./virtual.md). To allow for referencing these virtual chunks, you can configure the `virtual_chunk_containers` parameter to specify the storage locations and configurations for any virtual chunks. Each virtual chunk container is specified by a [`VirtualChunkContainer`](./reference.md#icechunk.VirtualChunkContainer) object which contains a name, a url prefix, and a storage configuration. When a container is added to the settings, any virtual chunks with a url that starts with the configured prefix will use the storage configuration for that matching container.
72+
Icechunk allows repos to contain [virtual chunks](./virtual.md). To allow for referencing these virtual chunks, you must configure the `virtual_chunk_containers` parameter to specify the storage locations and configurations for any virtual chunks. Each virtual chunk container is specified by a [`VirtualChunkContainer`](./reference.md#icechunk.VirtualChunkContainer) object which contains a url prefix, and a storage configuration. When a container is added to the settings, any virtual chunks with a url that starts with the configured prefix will use the storage configuration for that matching container.
7373

7474
!!! note
7575

76-
Currently only `s3` compatible storage and `local_filesystem` storage are supported for virtual chunk containers. Other storage backends such as `gcs`, `azure`, and `https` are on the roadmap.
76+
Currently only `s3` compatible storage, `gcs`, `local_filesystem` and `http[s]` storages are supported for virtual chunk containers. Other storage backends such as `azure` are on the roadmap.
7777

7878
#### Example
7979

@@ -82,7 +82,6 @@ For example, if we wanted to configure an icechunk repo to be able to contain vi
8282
```python
8383
config.virtual_chunk_containers = [
8484
icechunk.VirtualChunkContainer(
85-
name="my-s3-bucket",
8685
url_prefix="s3://my-s3-bucket/",
8786
storage=icechunk.StorageSettings(
8887
storage=icechunk.s3_storage(bucket="my-s3-bucket", region="us-east-1"),
@@ -96,7 +95,6 @@ If we also wanted to configure the repo to be able to contain virtual chunks fro
9695
```python
9796
config.set_virtual_chunk_container(
9897
icechunk.VirtualChunkContainer(
99-
name="my-other-s3-bucket",
10098
url_prefix="s3://my-other-s3-bucket/",
10199
storage=icechunk.StorageSettings(
102100
storage=icechunk.s3_storage(bucket="my-other-s3-bucket", region="us-west-2"),
@@ -105,11 +103,11 @@ config.set_virtual_chunk_container(
105103
)
106104
```
107105

108-
Now at read time, if icechunk encounters a virtual chunk url that starts with `s3://my-other-s3-bucket/`, it will use the storage configuration for the `my-other-s3-bucket` container.
106+
Now at read time, if Icechunk encounters a virtual chunk url that starts with `s3://my-other-s3-bucket/`, it will use the storage configuration for the `my-other-s3-bucket` container.
109107

110108
!!! note
111109

112-
While virtual chunk containers specify the storage configuration for any virtual chunks, they do not contain any authentication information. The credentials must also be specified when opening the repository using the [`virtual_chunk_credentials`](./reference.md#icechunk.Repository.open) parameter. See the [Virtual Chunk Credentials](#virtual-chunk-credentials) section for more information.
110+
While virtual chunk containers specify the storage configuration for any virtual chunks, they do not contain any authentication information. The credentials must also be specified when opening the repository using the [`authorize_virtual_chunk_access`](./reference.md#icechunk.Repository.open) parameter. This parameter also serves as a way for the user to authorize the access to the virtual chunk containers, containers that are not explicitly allowed with `authorize_virtual_chunk_access` won't be able to fetch their chunks. See the [Virtual Chunk Credentials](#virtual-chunk-credentials) section for more information.
113111

114112
### [`manifest`](./reference.md#icechunk.RepositoryConfig.manifest)
115113

@@ -269,21 +267,22 @@ The next time this repo is opened, the persisted config will be loaded by defaul
269267

270268
## Virtual Chunk Credentials
271269

272-
When using virtual chunk containers, the credentials for the storage backend must also be specified. This is done using the [`virtual_chunk_credentials`](./reference.md#icechunk.Repository.open) parameter when creating or opening the repo. Credentials are specified as a dictionary of container names mapping to credential objects. A helper function, [`containers_credentials`](./reference.md#icechunk.containers_credentials), is provided to make it easier to specify credentials for multiple containers.
270+
When using virtual chunk containers, the containers must be authorized by the repo user, and the credentials for the storage backend must be specified. This is done using the [`authorize_virtual_chunk_access`](./reference.md#icechunk.Repository.open) parameter when creating or opening the repo. Credentials are specified as a dictionary of container url prefixes mapping to credential objects or `None`. A `None` credential will fetch credentials from the process environment or it will use anonymous credentials if the container allows it. A helper function, [`containers_credentials`](./reference.md#icechunk.containers_credentials), is provided to make it easier to specify credentials for multiple containers.
273271

274272
### Example
275273

276274
Expanding on the example from the [Virtual Chunk Containers](#virtual-chunk-containers) section, we can configure the repo to use the credentials for the `my-s3-bucket` and `my-other-s3-bucket` containers.
277275

278276
```python
279277
credentials = icechunk.containers_credentials(
280-
my_s3_bucket=icechunk.s3_credentials(bucket="my-s3-bucket", region="us-east-1"),
281-
my_other_s3_bucket=icechunk.s3_credentials(bucket="my-other-s3-bucket", region="us-west-2"),
278+
{ "s3://my_s3_bucket": icechunk.s3_credentials(bucket="my-s3-bucket", region="us-east-1"),
279+
"s3://my_other_s3_bucket": icechunk.s3_credentials(bucket="my-other-s3-bucket", region="us-west-2"),
280+
}
282281
)
283282

284283
repo = icechunk.Repository.open(
285284
storage=storage,
286285
config=config,
287-
virtual_chunk_credentials=credentials,
286+
authorize_virtual_chunk_access=credentials,
288287
)
289288
```

docs/docs/icechunk-python/virtual.md

Lines changed: 49 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -92,12 +92,12 @@ storage = icechunk.local_filesystem_storage(
9292
)
9393

9494
config = icechunk.RepositoryConfig.default()
95-
config.set_virtual_chunk_container(icechunk.VirtualChunkContainer("s3", "s3://", icechunk.s3_store(region="us-east-1")))
96-
credentials = icechunk.containers_credentials(s3=icechunk.s3_credentials(anonymous=True))
95+
config.set_virtual_chunk_container(icechunk.VirtualChunkContainer("s3://mybucket/my/data/", icechunk.s3_store(region="us-east-1")))
96+
credentials = icechunk.containers_credentials({"s3://mybucket/my/data/": icechunk.s3_credentials(anonymous=True)})
9797
repo = icechunk.Repository.create(storage, config, credentials)
9898
```
9999

100-
With the repo created, lets write our virtual dataset to Icechunk with VirtualiZarr!
100+
With the repo created, and the virtual chunk container added, lets write our virtual dataset to Icechunk with VirtualiZarr!
101101

102102
```python
103103
session = repo.writable_session("main")
@@ -150,13 +150,17 @@ ds.sst.isel(time=26, zlev=0).plot(x='lon', y='lat', vmin=0)
150150

151151
![oisst](../assets/datasets/oisst.png)
152152

153+
!!! note
154+
155+
Users of the repo will need to enable the virtual chunk container by passing the `credentials` argument to `Repository.open`. This way, the repo user, flags the container as authorized. `credentials` argument must be a dict using url prefixes as keys and optional credentials as values. If the container requires no credentials, `None` can be used as the value in the map. Failing to authorize a container, will generate an error when a chunk is fetched from it.
156+
153157
## Virtual Reference API
154158

155159
While `VirtualiZarr` is the easiest way to create virtual datasets with Icechunk, the Store API that it uses to create the datasets in Icechunk is public. `IcechunkStore` contains a [`set_virtual_ref`](./reference.md#icechunk.IcechunkStore.set_virtual_ref) method that specifies a virtual ref for a specified chunk.
156160

157161
### Virtual Reference Storage Support
158162

159-
Currently, Icechunk supports two types of storage for virtual references:
163+
Currently, Icechunk supports four types of storage for virtual references:
160164

161165
#### S3 Compatible
162166

@@ -167,13 +171,48 @@ References to files accessible via S3 compatible storage.
167171
Here is how we can set the chunk at key `c/0` to point to a file on an s3 bucket,`mybucket`, with the prefix `my/data/file.nc`:
168172

169173
```python
170-
store.set_virtual_ref('c/0', 's3://mybucket/my/data/file.nc', offset=1000, length=200)
174+
config = icechunk.RepositoryConfig.default()
175+
config.set_virtual_chunk_container(icechunk.VirtualChunkContainer("s3://mybucket/my/data", icechunk.s3_store(region="us-east-1")))
176+
repo = icechunk.Repository.create(storage, config)
177+
session = repo.writable_session("main")
178+
session.store.set_virtual_ref('c/0', 's3://mybucket/my/data/file.nc', offset=1000, length=200)
171179
```
172180

173181
##### Configuration
174182

175183
S3 virtual references require configuring credential for the store to be able to access the specified s3 bucket. See [the configuration docs](./configuration.md#virtual-reference-storage-config) for instructions.
176184

185+
#### GCS
186+
187+
References to files accessible on Google Cloud Storage
188+
189+
##### Example
190+
191+
Here is how we can set the chunk at key `c/0` to point to a file on an s3 bucket,`mybucket`, with the prefix `my/data/file.nc`:
192+
193+
```python
194+
config = icechunk.RepositoryConfig.default()
195+
config.set_virtual_chunk_container(icechunk.VirtualChunkContainer("gcs://mybucket/my/data", icechunk.gcs_store(options={})))
196+
repo = icechunk.Repository.create(storage, config)
197+
session = repo.writable_session("main")
198+
session.store.set_virtual_ref('c/0', 'gcs://mybucket/my/data/file.nc', offset=1000, length=200)
199+
```
200+
201+
#### HTTP
202+
203+
References to files accessible via http(s) protocol
204+
205+
##### Example
206+
207+
Here is how we can set the chunk at key `c/0` to point to a file on `myserver`, with the prefix `my/data/file.nc`:
208+
209+
```python
210+
config = icechunk.RepositoryConfig.default()
211+
config.set_virtual_chunk_container(icechunk.VirtualChunkContainer("https://myserver/my/data", icechunk.http_store(options={})))
212+
repo = icechunk.Repository.create(storage, config)
213+
session = repo.writable_session("main")
214+
session.store.set_virtual_ref('c/0', 'https://myserver/my/data/file.nc', offset=1000, length=200)
215+
```
177216

178217
#### Local Filesystem
179218

@@ -184,7 +223,11 @@ References to files accessible via local filesystem. This requires any file path
184223
Here is how we can set the chunk at key `c/0` to point to a file on my local filesystem located at `/path/to/my/file.nc`:
185224

186225
```python
187-
store.set_virtual_ref('c/0', 'file:///path/to/my/file.nc', offset=20, length=100)
226+
config = icechunk.RepositoryConfig.default()
227+
config.set_virtual_chunk_container(icechunk.VirtualChunkContainer("s3://mybucket/my/data", icechunk.local_filesystem_store("/path/to/my")))
228+
repo = icechunk.Repository.create(storage, config)
229+
session = repo.writable_session("main")
230+
session.store.set_virtual_ref('c/0', 'file:///path/to/my/file.nc', offset=20, length=100)
188231
```
189232

190233
No extra configuration is necessary for local filesystem references.

icechunk-python/python/icechunk/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@
7878
http_store,
7979
in_memory_storage,
8080
local_filesystem_storage,
81+
local_filesystem_store,
8182
r2_storage,
8283
s3_storage,
8384
s3_store,
@@ -151,6 +152,7 @@
151152
"in_memory_storage",
152153
"initialize_logs",
153154
"local_filesystem_storage",
155+
"local_filesystem_store",
154156
"print_debug_info",
155157
"r2_storage",
156158
"s3_anonymous_credentials",

icechunk-python/python/icechunk/_icechunk_python.pyi

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1237,23 +1237,23 @@ class PyRepository:
12371237
storage: Storage,
12381238
*,
12391239
config: RepositoryConfig | None = None,
1240-
virtual_chunk_credentials: dict[str, AnyCredential] | None = None,
1240+
authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
12411241
) -> PyRepository: ...
12421242
@classmethod
12431243
def open(
12441244
cls,
12451245
storage: Storage,
12461246
*,
12471247
config: RepositoryConfig | None = None,
1248-
virtual_chunk_credentials: dict[str, AnyCredential] | None = None,
1248+
authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
12491249
) -> PyRepository: ...
12501250
@classmethod
12511251
def open_or_create(
12521252
cls,
12531253
storage: Storage,
12541254
*,
12551255
config: RepositoryConfig | None = None,
1256-
virtual_chunk_credentials: dict[str, AnyCredential] | None = None,
1256+
authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
12571257
) -> PyRepository: ...
12581258
@staticmethod
12591259
def exists(storage: Storage) -> bool: ...

icechunk-python/python/icechunk/credentials.py

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import pickle
22
from collections.abc import Callable, Mapping
33
from datetime import datetime
4+
from typing import cast
45

56
from icechunk._icechunk_python import (
67
AzureCredentials,
@@ -341,14 +342,14 @@ def azure_credentials(
341342

342343

343344
def containers_credentials(
344-
m: Mapping[str, AnyS3Credential] = {}, **kwargs: AnyS3Credential
345-
) -> dict[str, Credentials.S3]:
345+
m: Mapping[str, AnyS3Credential | AnyGcsCredential | AnyAzureCredential | None],
346+
) -> dict[str, AnyCredential | None]:
346347
"""Build a map of credentials for virtual chunk containers.
347348
348349
Parameters
349350
----------
350-
m: Mapping[str, AnyS3Credential]
351-
A mapping from container name to credentials.
351+
m: Mapping[str, AnyS3Credential | AnyGcsCredential | AnyAzureCredential ]
352+
A mapping from container url prefixes to credentials.
352353
353354
Examples
354355
--------
@@ -365,24 +366,36 @@ def containers_credentials(
365366
s3_compatible=True,
366367
force_path_style=True,
367368
)
368-
container = ic.VirtualChunkContainer("s3", "s3://", virtual_store_config)
369+
container = ic.VirtualChunkContainer("s3://somebucket", virtual_store_config)
369370
config.set_virtual_chunk_container(container)
370371
credentials = ic.containers_credentials(
371-
s3=ic.s3_credentials(access_key_id="ACCESS_KEY", secret_access_key="SECRET")
372+
{"s3://somebucket": ic.s3_credentials(access_key_id="ACCESS_KEY", secret_access_key="SECRET"}
372373
)
373374
374375
repo = ic.Repository.create(
375376
storage=ic.local_filesystem_storage(store_path),
376377
config=config,
377-
virtual_chunk_credentials=credentials,
378+
authorize_virtual_chunk_access=credentials,
378379
)
379380
```
380381
381382
"""
382-
res = {}
383-
for name, cred in {**m, **kwargs}.items():
384-
if isinstance(cred, AnyS3Credential):
383+
res: dict[str, AnyCredential | None] = {}
384+
for name, cred in m.items():
385+
if cred is None:
386+
res[name] = None
387+
elif isinstance(cred, AnyS3Credential):
385388
res[name] = Credentials.S3(cred)
389+
elif (
390+
isinstance(cred, GcsCredentials.FromEnv)
391+
or isinstance(cred, GcsCredentials.Static)
392+
or isinstance(cred, GcsCredentials.Refreshable)
393+
):
394+
res[name] = Credentials.Gcs(cast(GcsCredentials, cred))
395+
elif isinstance(cred, AzureCredentials.FromEnv) or isinstance(
396+
cred, AzureCredentials.Static
397+
):
398+
res[name] = Credentials.Azure(cast(AzureCredentials, cred))
386399
else:
387400
raise ValueError(f"Unknown credential type {type(cred)}")
388401
return res

icechunk-python/python/icechunk/repository.py

Lines changed: 27 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ def create(
2727
cls,
2828
storage: Storage,
2929
config: RepositoryConfig | None = None,
30-
virtual_chunk_credentials: dict[str, AnyCredential] | None = None,
30+
authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
3131
) -> Self:
3232
"""
3333
Create a new Icechunk repository.
@@ -43,8 +43,13 @@ def create(
4343
The storage configuration for the repository.
4444
config : RepositoryConfig, optional
4545
The repository configuration. If not provided, a default configuration will be used.
46-
virtual_chunk_credentials : dict[str, AnyCredential], optional
47-
Credentials for virtual chunks.
46+
authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
47+
Authorize Icechunk to access virtual chunks in these containers. A mapping
48+
from container url_prefix to the credentials to use to access chunks in
49+
that container. If credential is `None`, they will be fetched from the
50+
environment, or anonymous credentials will be used if the container allows it.
51+
As a security measure, Icechunk will block access to virtual chunks if the
52+
container is not authorized using this argument.
4853
4954
Returns
5055
-------
@@ -55,7 +60,7 @@ def create(
5560
PyRepository.create(
5661
storage,
5762
config=config,
58-
virtual_chunk_credentials=virtual_chunk_credentials,
63+
authorize_virtual_chunk_access=authorize_virtual_chunk_access,
5964
)
6065
)
6166

@@ -64,7 +69,7 @@ def open(
6469
cls,
6570
storage: Storage,
6671
config: RepositoryConfig | None = None,
67-
virtual_chunk_credentials: dict[str, AnyCredential] | None = None,
72+
authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
6873
) -> Self:
6974
"""
7075
Open an existing Icechunk repository.
@@ -82,8 +87,13 @@ def open(
8287
config : RepositoryConfig, optional
8388
The repository settings. If not provided, a default configuration will be
8489
loaded from the repository.
85-
virtual_chunk_credentials : dict[str, AnyCredential], optional
86-
Credentials for virtual chunks.
90+
authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
91+
Authorize Icechunk to access virtual chunks in these containers. A mapping
92+
from container url_prefix to the credentials to use to access chunks in
93+
that container. If credential is `None`, they will be fetched from the
94+
environment, or anonymous credentials will be used if the container allows it.
95+
As a security measure, Icechunk will block access to virtual chunks if the
96+
container is not authorized using this argument.
8797
8898
Returns
8999
-------
@@ -94,7 +104,7 @@ def open(
94104
PyRepository.open(
95105
storage,
96106
config=config,
97-
virtual_chunk_credentials=virtual_chunk_credentials,
107+
authorize_virtual_chunk_access=authorize_virtual_chunk_access,
98108
)
99109
)
100110

@@ -103,7 +113,7 @@ def open_or_create(
103113
cls,
104114
storage: Storage,
105115
config: RepositoryConfig | None = None,
106-
virtual_chunk_credentials: dict[str, AnyCredential] | None = None,
116+
authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
107117
) -> Self:
108118
"""
109119
Open an existing Icechunk repository or create a new one if it does not exist.
@@ -122,8 +132,13 @@ def open_or_create(
122132
config : RepositoryConfig, optional
123133
The repository settings. If not provided, a default configuration will be
124134
loaded from the repository.
125-
virtual_chunk_credentials : dict[str, AnyCredential], optional
126-
Credentials for virtual chunks.
135+
authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
136+
Authorize Icechunk to access virtual chunks in these containers. A mapping
137+
from container url_prefix to the credentials to use to access chunks in
138+
that container. If credential is `None`, they will be fetched from the
139+
environment, or anonymous credentials will be used if the container allows it.
140+
As a security measure, Icechunk will block access to virtual chunks if the
141+
container is not authorized using this argument.
127142
128143
Returns
129144
-------
@@ -134,7 +149,7 @@ def open_or_create(
134149
PyRepository.open_or_create(
135150
storage,
136151
config=config,
137-
virtual_chunk_credentials=virtual_chunk_credentials,
152+
authorize_virtual_chunk_access=authorize_virtual_chunk_access,
138153
)
139154
)
140155

0 commit comments

Comments
 (0)