Problem
Currently when authorizing access to virtual chunk containers for a private bucket during Repository.open() these two kwargs values are equivalent:
authorize_virtual_chunk_access={"s3://my-private-bucket": None}
authorize_virtual_chunk_access={"s3://my-private-bucket/": icechunk.S3Credentials(from_env=True)}
but only the latter actually indicates clearly what you are doing: allowing Icechunk to look for local S3 credentials (which may of course be sensitive), which can then be used to read anything from the entirety of s3://my-private-bucket.
Given that the purpose of the authorize_virtual_chunk_access arg is to force users to explicitly opt-in to accessing potentially-sensitive storage locations for safety reasons, the auto-magicness of None seems undesirable.
It's also not ideal that None could be interpreted to mean "false" instead of the intended meaning here of "default". So a user might reasonably read the code above as meaning "do not authorize access to s3://my-private-bucket" when it actually means the opposite.
Consistency with other explicit API
We already have some existing API for being much more explicit about this. As well as icechunk.S3Credentials(from_env=True) we also have s3_anonymous_credentials(), allowing the (much safer) anon-only case to be clearly distinguished:
authorize_virtual_chunk_access={"s3://my-company-bucket": s3_anonymous_credentials()}
This kind of explicit sentinel is preferable, but not all our API forces you to use this explicit option, and some paths do not have a corresponding sentinel defined.
Proposed changes
I want the only way to authorize virtual chunk access to be to use an explicit sentinel. So I want None to never be an allowed value.
That would mean that all types of virtual chunk access required use of the correct sentinel object:
# anonymous S3 bucket
authorize_virtual_chunk_access={"s3://my-private-bucket/": icechunk.s3_anonymous_credentials()}
# private S3 bucket - credentials from env
authorize_virtual_chunk_access={"s3://my-private-bucket/": icechunk.S3Credentials(from_env=True)}
# private S3 bucket - explicit credentials
authorize_virtual_chunk_access={"s3://my-private-bucket/": icechunk.S3Credentials(access_key=...)}
# local filesystem - explicitly no credentials required
authorize_virtual_chunk_access={"file:///my-home-dir/": icechunk.credentials.LocalFilesystemAccess} # a new sentinel type
# local filesystem - explicitly no credentials required
authorize_virtual_chunk_access={"http://some-endpoint/": icechunk.credentials.HttpAccess} # a new sentinel type
Note the last two are new sentinel types, with names chosen to make it clear what is happening.
Deprecation cycle
Eventually we want passing authorize_virtual_chunk_access={"s3://priate-bucket/": None} to raise, but that's a breaking change for many virtual chunk users, so instead we do a deprecation cycle, where initially we merely warn on None, with the new sentinel suggested instead.
Overall the users still have to put in a similar amount of work before and after the change - they have to write store-specific boilerplate either way. The only loss of ergonomics is that technically None worked for various types of storage, whereas now their sentinel has to be storage-specific.
Problem
Currently when authorizing access to virtual chunk containers for a private bucket during
Repository.open()these two kwargs values are equivalent:but only the latter actually indicates clearly what you are doing: allowing Icechunk to look for local S3 credentials (which may of course be sensitive), which can then be used to read anything from the entirety of
s3://my-private-bucket.Given that the purpose of the
authorize_virtual_chunk_accessarg is to force users to explicitly opt-in to accessing potentially-sensitive storage locations for safety reasons, the auto-magicness ofNoneseems undesirable.It's also not ideal that
Nonecould be interpreted to mean "false" instead of the intended meaning here of "default". So a user might reasonably read the code above as meaning "do not authorize access tos3://my-private-bucket" when it actually means the opposite.Consistency with other explicit API
We already have some existing API for being much more explicit about this. As well as
icechunk.S3Credentials(from_env=True)we also haves3_anonymous_credentials(), allowing the (much safer) anon-only case to be clearly distinguished:This kind of explicit sentinel is preferable, but not all our API forces you to use this explicit option, and some paths do not have a corresponding sentinel defined.
Proposed changes
I want the only way to authorize virtual chunk access to be to use an explicit sentinel. So I want
Noneto never be an allowed value.That would mean that all types of virtual chunk access required use of the correct sentinel object:
Note the last two are new sentinel types, with names chosen to make it clear what is happening.
Deprecation cycle
Eventually we want passing
authorize_virtual_chunk_access={"s3://priate-bucket/": None}to raise, but that's a breaking change for many virtual chunk users, so instead we do a deprecation cycle, where initially we merely warn onNone, with the new sentinel suggested instead.Overall the users still have to put in a similar amount of work before and after the change - they have to write store-specific boilerplate either way. The only loss of ergonomics is that technically
Noneworked for various types of storage, whereas now their sentinel has to be storage-specific.