Skip to content

🪣 Support for OBject Storage#1021

Merged
danielballan merged 24 commits intobluesky:mainfrom
Kezzsim:obtsor
Oct 28, 2025
Merged

🪣 Support for OBject Storage#1021
danielballan merged 24 commits intobluesky:mainfrom
Kezzsim:obtsor

Conversation

@Kezzsim
Copy link
Contributor

@Kezzsim Kezzsim commented Jul 17, 2025

Resolves #905

@Kezzsim Kezzsim changed the title 🎈 *writes data to get teh party started* 🪣 Support for Bucket Storage Jul 17, 2025
@Kezzsim Kezzsim marked this pull request as draft July 17, 2025 15:14
@Kezzsim Kezzsim changed the title 🪣 Support for Bucket Storage 🪣 Support for OBject Storage Jul 17, 2025
@danielballan danielballan added this to the v0.1.0 release milestone Aug 5, 2025
@danielballan danielballan removed this from the v0.1.0 release milestone Aug 28, 2025
@danielballan
Copy link
Member

The only remaining big thing here is to run a MinIO container on GHA so we can test against it.

tiled/storage.py Outdated
def get_storage(uri: str) -> Storage:
"Look up Storage by URI."
return _STORAGE[uri]
def get_storage(uri: str) -> Storage | Tuple[Storage, str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ambiguity in return type could be a signal that the design isn't quite right.

If the problem is, "How can we get the path back to the user?" what are our other options? What would the consequences be of adding a path attribute to ObjectStorage?

tiled/storage.py Outdated
elif scheme in {"sqlite", "duckdb"}:
return EmbeddedSQLStorage(uri)
elif scheme == "http":
# Split on the first single '/' that is not part of '://'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works but it's a lot to digest:

  • regex
  • string emptiness check
  • lstrip
  • replace

Above, we have the parsed URL urlparse(uri), which we use to get the scheme. If we keep a reference to that in some local variable, parsed_uri, we can easily grab the path:

full_path = parsed_uri.path  # includes bucket and the rest
bucket_name, blob_path = full_path.split("/", 1)

Aside: If you haven't seen it before, the second argument to split works like this:

>>> 'a/b/c'.split('/')
['a', 'b', 'c']
>>> 'a/b/c'.split('/', 1)  # split on the first instance of '/' only
['a', 'b/c']

Now, to get the "bucket only" version of the URL:

from urllib.parse import urlunparse

# Get the URL without the blob_path, just the bucket.
bucket_uri = urlunparse(parsed_uri._replace(path='/' + bucket_name))
# Look up storage by bucket URI.
storage = _STORAGE[bucket_uri]
# Return a copy encapsulating the blob_path.
return storage.with_blob_path(blob_path)  # Note: This method would need to be added to ObjectStorage.

Example:

>>> uri = 'https://example.com/bucket/a/b/c'
>>> parsed_uri = urlparse(uri)
>>> parsed_uri
ParseResult(scheme='https', netloc='example.com', path='/bucket/a/b/c', params='', query='', fragment='')

>>> full_path = parsed_uri.path
>>> full_path
'/bucket/a/b/c'

>>> full_path.lstrip('/')
'bucket/a/b/c'

>>> bucket_name, blob_path = full_path.lstrip('/').split('/', 1)
>>> bucket_name
'bucket'
>>> blob_path
'a/b/c'

@Kezzsim
Copy link
Contributor Author

Kezzsim commented Oct 23, 2025

Pertaining to the minio testing container:

  • Specify Object fixture in test_writing.py
  • Set fixture to xfail when minio url isn't available
  • Add GitHub Actions to setup the minio container
  • Add code to reset the state of the bucket after each test

@Kezzsim Kezzsim marked this pull request as ready for review October 27, 2025 18:49
Copy link
Member

@danielballan danielballan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that we leave Azure and Google off now, until they can be tested, but we have a clear path for adding support soon.

https://github.com/bluesky/tiled/pull/1021/files#diff-38e7f2525a2fda64425a32d8b78379eedfbb2a2d3bd147e5c6f0571be0210201R29

And I like the clarity of this:

https://github.com/bluesky/tiled/pull/1021/files#diff-1fb9039deb7e0ac14eb3afa15f93579ebeef9858ca8275a7ccef55ad6b52277eR218-R219

I hope to test drive this before clicking the green merge button, but this will go in in time for maint, barring any surprises. Well done!

@danielballan danielballan merged commit c4b1693 into bluesky:main Oct 28, 2025
11 checks passed
ZohebShaikh pushed a commit that referenced this pull request Feb 21, 2026
* 🎈 *writes data to get teh party started*

* 🌫️ *anxiously adds more cloud providers*

* Resolve mypy errors

* 👍️ Resolve minio https error preventing us from writing `zarr.json`

* 🚮 Experiment with writing (sloppy) data

* 🪲 DEBUG: problems with `write`

* 🕶️ Review : Add missing prefix

Co-authored-by: Eugene <ymatviych@bnl.gov>

* ✍️ Write regex helper function

* 🧽 refactor to clean up repeated code

* ✍️ Add Blobs to writing tests

* ✍️ Rewrite `get_storage` to be a router for buckets

* refactor ObjectStorage

* 🐋 Add minio container to CI for testing

* 🧪 Make `TILED_TEST_BUCKET` env var for advanced testing

* More refactoring of Storage

* FIX: look up registered storages instead of recreating them

* Simplify test config

* TST: fix test_writing + more refactoring

* MNT: add minio dependency for server

* ENH: generalize asset deletion

---------

Co-authored-by: Eugene <ymatviych@bnl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support writable blob storage

3 participants