|
| 1 | +--- |
| 2 | +draft: false |
| 3 | +date: 2025-02-10 |
| 4 | +categories: |
| 5 | + - Release |
| 6 | +authors: |
| 7 | + - kylebarron |
| 8 | +links: |
| 9 | + - CHANGELOG.md |
| 10 | +--- |
| 11 | + |
| 12 | +# Releasing obstore 0.4! |
| 13 | + |
| 14 | +Obstore is the simplest, highest-throughput Python interface to Amazon S3, Google Cloud Storage, and Azure Storage, powered by Rust. |
| 15 | + |
| 16 | +This post gives an overview of what's new in obstore version 0.4. |
| 17 | + |
| 18 | +<!-- more --> |
| 19 | + |
| 20 | +Refer to the [changelog](../../CHANGELOG.md#040-2025-02-10) for all updates. |
| 21 | + |
| 22 | +## Easier store creation with `from_url` |
| 23 | + |
| 24 | +There's a new top-level [`obstore.store.from_url`][] function, which makes it dead-simple to create a store from a URL. |
| 25 | + |
| 26 | +Here's an example of using it to inspect data from the [Sentinel-2 open data bucket](https://registry.opendata.aws/sentinel-2-l2a-cogs/). `from_url` automatically infers that this is an S3 path and constructs an [`S3Store`][obstore.store.S3Store], which we can pass to [`obstore.list_with_delimiter`][] and [`obstore.get`][]. |
| 27 | + |
| 28 | +```py |
| 29 | +import obstore as obs |
| 30 | +from obstore.store import from_url |
| 31 | + |
| 32 | +# The base path within the bucket to "mount" to |
| 33 | +url = "s3://sentinel-cogs/sentinel-s2-l2a-cogs/12/S/UF/2022/6/S2A_12SUF_20220601_0_L2A" |
| 34 | + |
| 35 | +# Pass in store-specific parameters as keyword arguments |
| 36 | +# Here, we pass `skip_signature=True` because it's a public bucket |
| 37 | +store = from_url(url, region="us-west-2", skip_signature=True) |
| 38 | + |
| 39 | +# Print filenames in this directory |
| 40 | +print([meta["path"] for meta in obs.list_with_delimiter(store)["objects"]]) |
| 41 | +# ['AOT.tif', 'B01.tif', 'B02.tif', 'B03.tif', 'B04.tif', 'B05.tif', 'B06.tif', 'B07.tif', 'B08.tif', 'B09.tif', 'B11.tif', 'B12.tif', 'B8A.tif', 'L2A_PVI.tif', 'S2A_12SUF_20220601_0_L2A.json', 'SCL.tif', 'TCI.tif', 'WVP.tif', 'granule_metadata.xml', 'thumbnail.jpg', 'tileinfo_metadata.json'] |
| 42 | + |
| 43 | +# Download thumbnail |
| 44 | +with open("thumbnail.jpg", "wb") as f: |
| 45 | + f.write(obs.get(store, "thumbnail.jpg").bytes()) |
| 46 | +``` |
| 47 | + |
| 48 | +And voilà, we have a thumbnail of the Grand Canyon from space: |
| 49 | + |
| 50 | + |
| 51 | + |
| 52 | +`from_url` also supports typing overloads. So your type checker will raise an error if you try to mix AWS-specific and Azure-specific configuration. |
| 53 | + |
| 54 | +Nevertheless, for best typing support, we still suggest using one of the store-specific `from_url` constructors (such as [`S3Store.from_url`][obstore.store.from_url]) if you know the protocol. Then your type checker can infer the type of the returned store. |
| 55 | + |
| 56 | + |
| 57 | +## Pickle support |
| 58 | + |
| 59 | +One of obstore's initial integration targets is [zarr-python](https://github.com/zarr-developers/zarr-python), which needs to load large chunked N-dimensional arrays from object storage. In our [early benchmarking](https://github.com/maxrjones/zarr-obstore-performance), we've found that the [obstore-based backend](https://github.com/zarr-developers/zarr-python/pull/1661) can cut data loading times in half as compared to the standard fsspec-based backend. |
| 60 | + |
| 61 | +However, Zarr is commonly used in distributed execution environments like [Dask](https://www.dask.org/), which needs to be able to move store instances between workers. We've implemented [pickle](https://docs.python.org/3/library/pickle.html) support for store classes to unblock this use case. Read [our pickle documentation](../../advanced/pickle.md) for more info. |
| 62 | + |
| 63 | +## Enhanced loading of AWS credentials (provisional) |
| 64 | + |
| 65 | +By default, each store class expects to find credential information either in environment variables or in passed-in arguments. In the case of AWS, that means the default constructors will not look in file-based credentials sources. |
| 66 | + |
| 67 | +The provisional [`S3Store._from_native`][obstore.store.S3Store._from_native] constructor uses the [official AWS Rust configuration crate](https://docs.rs/aws-config/latest/aws_config/) to find credentials on the file system. This integration is expected to also automatically refresh temporary credentials before expiration. |
| 68 | + |
| 69 | +This API is provisional and may change in the future. If you have any feedback, please [open an issue](https://github.com/developmentseed/obstore/issues/new/choose). |
| 70 | + |
| 71 | +Obstore version 0.5 is expected to improve on extensible credentials by enabling users to pass in arbitrary credentials in a sync or async function callback. |
| 72 | + |
| 73 | +## Return Arrow data from `list_with_delimiter` |
| 74 | + |
| 75 | +By default, the [`obstore.list`][] and [`obstore.list_with_delimiter`][] APIs [return standard Python `dict`s][obstore.ObjectMeta]. However, if you're listing a large bucket, the overhead of materializing all those Python objects can become significant. |
| 76 | + |
| 77 | +[`obstore.list`][] and [`obstore.list_with_delimiter`][] now both support a `return_arrow` keyword parameter. If set to `True`, an Arrow [`RecordBatch`][arro3.core.RecordBatch] or [`Table`][arro3.core.Table] will be returned, which is both faster and more memory efficient. |
| 78 | + |
| 79 | +## Access configuration values back from a store |
| 80 | + |
| 81 | +There are new attributes, such as [`config`][obstore.store.S3Store.config], [`client_options`][obstore.store.S3Store.client_options], and [`retry_config`][obstore.store.S3Store.retry_config] for accessing configuration parameters _back_ from a store instance. |
| 82 | + |
| 83 | +This example uses an [`S3Store`][obstore.store.S3Store] but the same behavior applies to [`GCSStore`][obstore.store.GCSStore] and [`AzureStore`][obstore.store.AzureStore] as well. |
| 84 | + |
| 85 | +```py |
| 86 | +from obstore.store import S3Store |
| 87 | + |
| 88 | +store = S3Store.from_url( |
| 89 | + "s3://ookla-open-data/parquet/performance/type=fixed/year=2024/quarter=1", |
| 90 | + region="us-west-2", |
| 91 | + skip_signature=True, |
| 92 | +) |
| 93 | +new_store = S3Store( |
| 94 | + config=store.config, |
| 95 | + prefix=store.prefix, |
| 96 | + client_options=store.client_options, |
| 97 | + retry_config=store.retry_config, |
| 98 | +) |
| 99 | +assert store.config == new_store.config |
| 100 | +assert store.prefix == new_store.prefix |
| 101 | +assert store.client_options == new_store.client_options |
| 102 | +assert store.retry_config == new_store.retry_config |
| 103 | +``` |
| 104 | + |
| 105 | +## Open remote objects as file-like readers or writers |
| 106 | + |
| 107 | +This version adds support for opening remote objects as a [file-like](../../api/file.md) reader or writer. |
| 108 | + |
| 109 | +```py |
| 110 | +import os |
| 111 | + |
| 112 | +import obstore as obs |
| 113 | +from obstore.store import MemoryStore |
| 114 | + |
| 115 | +# Create an in-memory store |
| 116 | +store = MemoryStore() |
| 117 | + |
| 118 | +# Iteratively write to the file |
| 119 | +with obs.open_writer(store, "new_file.csv") as writer: |
| 120 | + writer.write(b"col1,col2,col3\n") |
| 121 | + writer.write(b"a,1,True\n") |
| 122 | + writer.write(b"b,2,False\n") |
| 123 | + writer.write(b"c,3,True\n") |
| 124 | + |
| 125 | + |
| 126 | +# Open a reader from the file |
| 127 | +reader = obs.open_reader(store, "new_file.csv") |
| 128 | +file_length = reader.seek(0, os.SEEK_END) |
| 129 | +print(file_length) # 43 |
| 130 | +reader.seek(0) |
| 131 | +buf = reader.read() |
| 132 | +print(buf) |
| 133 | +# Bytes(b"col1,col2,col3\na,1,True\nb,2,False\nc,3,True\n") |
| 134 | +``` |
| 135 | + |
| 136 | +See [`obstore.open_reader`][] and [`obstore.open_writer`][] for more details. An async file-like reader and writer is also provided, see [`obstore.open_reader_async`][] and [`obstore.open_writer_async`][]. |
| 137 | + |
| 138 | +## Benchmarking |
| 139 | + |
| 140 | +[Benchmarking is still ongoing](https://github.com/geospatial-jeff/pyasyncio-benchmark), but early results have been very promising and we've [added documentation about our progress so far](../../performance.md). |
| 141 | + |
| 142 | +## New examples |
| 143 | + |
| 144 | +We've worked to update the documentation with more examples! We now have examples for how to use obstore with [FastAPI](../../examples/fastapi.md), [MinIO](../../examples/minio.md), and [tqdm](../../examples/tqdm.md). |
| 145 | + |
| 146 | +We've also worked to consolidate introductory documentation into the ["user guide"](../../getting-started.md). |
| 147 | + |
| 148 | +## All updates |
| 149 | + |
| 150 | +Refer to the [changelog](../../CHANGELOG.md#040-2025-02-10) for all updates. |
0 commit comments