|
| 1 | +--- |
| 2 | +draft: false |
| 3 | +date: 2025-03-10 |
| 4 | +categories: |
| 5 | + - Release |
| 6 | +authors: |
| 7 | + - kylebarron |
| 8 | +links: |
| 9 | + - CHANGELOG.md |
| 10 | +--- |
| 11 | + |
| 12 | +# Releasing obstore 0.5! |
| 13 | + |
| 14 | +Obstore is the simplest, highest-throughput Python interface to Amazon S3, Google Cloud Storage, and Azure Storage, powered by Rust. |
| 15 | + |
| 16 | +This post gives an overview of what's new in obstore version 0.5. |
| 17 | + |
| 18 | +<!-- more --> |
| 19 | + |
| 20 | +Refer to the [changelog](../../CHANGELOG.md) for all updates. |
| 21 | + |
| 22 | +## Credential providers |
| 23 | + |
| 24 | +Authentication tends to be among the trickiest but most important elements of connecting to object storage. There are many ways to handle credentials, and trying to support every one natively in Obstore demands a high maintenance burden. |
| 25 | + |
| 26 | +Instead, this release supports **custom credential providers**: Python callbacks that allow for full control over credential generation. |
| 27 | + |
| 28 | +We'll dive into a few salient points, but make sure to read the [full authentication documentation](../../authentication.md) in the user guide. |
| 29 | + |
| 30 | +### "Official" SDK credential providers |
| 31 | + |
| 32 | +You can use the [`Boto3CredentialProvider`][obstore.auth.boto3.Boto3CredentialProvider] to use [`boto3.Session`][boto3.session.Session] to handle credentials. |
| 33 | + |
| 34 | +```py |
| 35 | +from boto3 import Session |
| 36 | +from obstore.auth.boto3 import Boto3CredentialProvider |
| 37 | +from obstore.store import S3Store |
| 38 | + |
| 39 | +session = Session(...) |
| 40 | +credential_provider = Boto3CredentialProvider(session) |
| 41 | +store = S3Store("bucket_name", credential_provider=credential_provider) |
| 42 | +``` |
| 43 | + |
| 44 | +### Custom credential providers |
| 45 | + |
| 46 | +There's a long tail of possible authentication mechanisms. Obstore allows you to provide your own custom authentication callback. |
| 47 | + |
| 48 | +You can provide either a **synchronous or asynchronous** custom authentication function. |
| 49 | + |
| 50 | +The simplest custom credential provider can be just a function callback: |
| 51 | + |
| 52 | +```py |
| 53 | +from datetime import datetime, timedelta, UTC |
| 54 | + |
| 55 | +def get_credentials() -> S3Credential: |
| 56 | + return { |
| 57 | + "access_key_id": "...", |
| 58 | + "secret_access_key": "...", |
| 59 | + # Not always required |
| 60 | + "token": "...", |
| 61 | + "expires_at": datetime.now(UTC) + timedelta(minutes=30), |
| 62 | + } |
| 63 | +``` |
| 64 | + |
| 65 | +Then just pass that function into `credential_provider`: |
| 66 | + |
| 67 | +```py |
| 68 | +S3Store(..., credential_provider=get_credentials) |
| 69 | +``` |
| 70 | + |
| 71 | +More advanced credential providers, which may need to store state, can be class based. See the [authentication user guide](../../authentication.md) for more information. |
| 72 | + |
| 73 | +### Automatic token refresh |
| 74 | + |
| 75 | +If the credential returned by the credential provider includes an `expires_at` key, obstore will **automatically** call the credential provider to refresh your token before the expiration time. |
| 76 | + |
| 77 | +**Your code doesn't need to think about token expiration times!** |
| 78 | + |
| 79 | +This allows for seamlessly using something like the [AWS Security Token Service (STS)](https://docs.aws.amazon.com/STS/latest/APIReference/welcome.html), which provides temporary token credentials each hour. See [`StsCredentialProvider`][obstore.auth.boto3.StsCredentialProvider] for an example of a credential provider that uses [`STS.Client.assume_role`][] to automatically refresh tokens. |
| 80 | + |
| 81 | +## Improved Fsspec integration |
| 82 | + |
| 83 | +This release also significantly improves integration with the [fsspec](https://github.com/fsspec/filesystem_spec) ecosystem. |
| 84 | + |
| 85 | +You can now [register][obstore.fsspec.register] obstore as the default handler for supported protocols, like `s3`, `gs`, and `az`. Then calling `fsspec.filesystem` or `fsspec.open` will automatically defer to [`obstore.fsspec.FsspecStore`][] and [`obstore.fsspec.BufferedFile`][], respectively. |
| 86 | + |
| 87 | +The fsspec integration is no longer tied to a specific bucket. Instead, [`FsspecStore`][obstore.fsspec.FsspecStore] will automatically handle multiple buckets within a single protocol. |
| 88 | + |
| 89 | +For example, obstore's fsspec integration is now tested as working with [pyarrow](../../examples/pyarrow.md). |
| 90 | + |
| 91 | +For more information, read the [fsspec page in the user guide](../../fsspec.md). |
| 92 | + |
| 93 | +## Improved AWS type hinting |
| 94 | + |
| 95 | +Type hinting has been improved for AWS enums, for example AWS region. Now, when you're constructing an S3Store, if your editor supports it, you'll receive suggestions based on the type hints. |
| 96 | + |
| 97 | +Here are two examples from vscode: |
| 98 | + |
| 99 | +{: style="height:300px"} |
| 100 | +{: style="height:200px"} |
| 101 | + |
| 102 | +## Benchmarking |
| 103 | + |
| 104 | +We've continued work on [benchmarking obstore](https://github.com/geospatial-jeff/pyasyncio-benchmark). |
| 105 | + |
| 106 | +New benchmarks run on an EC2 M5 instance indicate obstore provides [2.8x higher throughput than aioboto3](https://github.com/geospatial-jeff/pyasyncio-benchmark/blob/40e67509a248c5102a6b1608bcb9773295691213/test_results/20250218_results/ec2_m5/aggregated_results.csv) when fetching the first 16KB of a file many times from an async context. |
| 107 | + |
| 108 | +## Improved documentation |
| 109 | + |
| 110 | +- [fsspec documentation](../../fsspec.md) |
| 111 | +- [pyarrow integration](../../examples/pyarrow.md) |
| 112 | +- [authentication documentation](../../authentication.md) |
| 113 | + |
| 114 | +## All updates |
| 115 | + |
| 116 | +Refer to the [changelog](../../CHANGELOG.md) for all updates. |
0 commit comments