🪣 Support for OBject Storage#1021
Conversation
Co-authored-by: Eugene <ymatviych@bnl.gov>
|
The only remaining big thing here is to run a MinIO container on GHA so we can test against it. |
tiled/storage.py
Outdated
| def get_storage(uri: str) -> Storage: | ||
| "Look up Storage by URI." | ||
| return _STORAGE[uri] | ||
| def get_storage(uri: str) -> Storage | Tuple[Storage, str]: |
There was a problem hiding this comment.
This ambiguity in return type could be a signal that the design isn't quite right.
If the problem is, "How can we get the path back to the user?" what are our other options? What would the consequences be of adding a path attribute to ObjectStorage?
tiled/storage.py
Outdated
| elif scheme in {"sqlite", "duckdb"}: | ||
| return EmbeddedSQLStorage(uri) | ||
| elif scheme == "http": | ||
| # Split on the first single '/' that is not part of '://' |
There was a problem hiding this comment.
This works but it's a lot to digest:
- regex
- string emptiness check
lstripreplace
Above, we have the parsed URL urlparse(uri), which we use to get the scheme. If we keep a reference to that in some local variable, parsed_uri, we can easily grab the path:
full_path = parsed_uri.path # includes bucket and the rest
bucket_name, blob_path = full_path.split("/", 1)Aside: If you haven't seen it before, the second argument to split works like this:
>>> 'a/b/c'.split('/')
['a', 'b', 'c']
>>> 'a/b/c'.split('/', 1) # split on the first instance of '/' only
['a', 'b/c']Now, to get the "bucket only" version of the URL:
from urllib.parse import urlunparse
# Get the URL without the blob_path, just the bucket.
bucket_uri = urlunparse(parsed_uri._replace(path='/' + bucket_name))
# Look up storage by bucket URI.
storage = _STORAGE[bucket_uri]
# Return a copy encapsulating the blob_path.
return storage.with_blob_path(blob_path) # Note: This method would need to be added to ObjectStorage.Example:
>>> uri = 'https://example.com/bucket/a/b/c'
>>> parsed_uri = urlparse(uri)
>>> parsed_uri
ParseResult(scheme='https', netloc='example.com', path='/bucket/a/b/c', params='', query='', fragment='')
>>> full_path = parsed_uri.path
>>> full_path
'/bucket/a/b/c'
>>> full_path.lstrip('/')
'bucket/a/b/c'
>>> bucket_name, blob_path = full_path.lstrip('/').split('/', 1)
>>> bucket_name
'bucket'
>>> blob_path
'a/b/c'refactor `ObjectStorage` and clean up code
|
Pertaining to the minio testing container:
|
danielballan
left a comment
There was a problem hiding this comment.
I like that we leave Azure and Google off now, until they can be tested, but we have a clear path for adding support soon.
And I like the clarity of this:
I hope to test drive this before clicking the green merge button, but this will go in in time for maint, barring any surprises. Well done!
* 🎈 *writes data to get teh party started* * 🌫️ *anxiously adds more cloud providers* * Resolve mypy errors * 👍️ Resolve minio https error preventing us from writing `zarr.json` * 🚮 Experiment with writing (sloppy) data * 🪲 DEBUG: problems with `write` * 🕶️ Review : Add missing prefix Co-authored-by: Eugene <ymatviych@bnl.gov> * ✍️ Write regex helper function * 🧽 refactor to clean up repeated code * ✍️ Add Blobs to writing tests * ✍️ Rewrite `get_storage` to be a router for buckets * refactor ObjectStorage * 🐋 Add minio container to CI for testing * 🧪 Make `TILED_TEST_BUCKET` env var for advanced testing * More refactoring of Storage * FIX: look up registered storages instead of recreating them * Simplify test config * TST: fix test_writing + more refactoring * MNT: add minio dependency for server * ENH: generalize asset deletion --------- Co-authored-by: Eugene <ymatviych@bnl.gov>
Resolves #905