Skip to content

Prepare 0.5 changelog #313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 35 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,42 @@
# Changelog

## [0.5.0] - unreleased
## [0.5.0] - 2025-03-10

### New Features :magic_wand:

- **User-supplied credential callback** by @kylebarron in https://github.com/developmentseed/obstore/pull/234
- **Fsspec updates**:
- [FEAT] Create obstore store in fsspec on demand by @machichima in https://github.com/developmentseed/obstore/pull/198
- [FEAT] support df.to_parquet and df.read_parquet() by @machichima in https://github.com/developmentseed/obstore/pull/165
- Document fsspec integration in user guide by @kylebarron in https://github.com/developmentseed/obstore/pull/299
- fsspec: Allow calling `register` with no arguments by @kylebarron in https://github.com/developmentseed/obstore/pull/298
- Enable pickling Bytes by @kylebarron in https://github.com/developmentseed/obstore/pull/295
- Add AWS literal type hints by @kylebarron in https://github.com/developmentseed/obstore/pull/301
- pyo3-bytes slicing by @jessekrubin in https://github.com/developmentseed/obstore/pull/249

### Breaking changes :wrench:

- Removed `S3Store.from_session` and `S3Store._from_native`. Use credential providers instead.
- Rename `AsyncFsspecStore` to `FsspecStore` by @kylebarron in https://github.com/developmentseed/obstore/pull/297
- Reduce the config variations supported for input. I.e. we previously allowed `region`, `aws_region`, `REGION` or `AWS_REGION` as a config parameter to `S3Store`, which could make it confusing. We now only support a single config input value for each underlying concept. https://github.com/developmentseed/obstore/pull/323

### Bug fixes :bug:

- Validate input for range request by @kylebarron in https://github.com/developmentseed/obstore/pull/255

### Documentation :book:

- Update performance numbers by @kylebarron in https://github.com/developmentseed/obstore/pull/307
- Document type-only constructs by @kylebarron in https://github.com/developmentseed/obstore/pull/309, https://github.com/developmentseed/obstore/pull/311
- Add import warning admonition on ObjectStore type by @kylebarron in
- Update etag conditional put docs by @kylebarron in https://github.com/developmentseed/obstore/pull/310

### New Contributors

- @weiji14 made their first contribution in https://github.com/developmentseed/obstore/pull/272
- @machichima made their first contribution in https://github.com/developmentseed/obstore/pull/198

**Full Changelog**: https://github.com/developmentseed/obstore/compare/py-v0.4.0...py-v0.5.0

## [0.4.0] - 2025-02-10

Expand All @@ -19,11 +51,13 @@
- Enable automatic cleanup for local store, when deleting directories by @kylebarron in https://github.com/developmentseed/obstore/pull/175
- Optionally create root dir in LocalStore by @kylebarron in https://github.com/developmentseed/obstore/pull/177
- **File-like object** updates:

- Add support for writable file-like objects by @kylebarron in https://github.com/developmentseed/obstore/pull/167
- Updates to readable file API:

- Support user-specified capacity in readable file-like objects by @kylebarron in https://github.com/developmentseed/obstore/pull/174
- Expose `ObjectMeta` from readable file API by @kylebarron in https://github.com/developmentseed/obstore/pull/176

- Merge `config` and `kwargs` and validate that no configuration parameters have been passed multiple times. (https://github.com/developmentseed/obstore/pull/180, https://github.com/developmentseed/obstore/pull/182, https://github.com/developmentseed/obstore/pull/218)
- Add `__repr__` to `Bytes` class by @jessekrubin in https://github.com/developmentseed/obstore/pull/173

Expand Down
Binary file added docs/assets/aws_type_hint1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/aws_type_hint2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 0 additions & 1 deletion docs/authentication.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,6 @@ from obstore.store import S3Store
session = Session(...)
credential_provider = Boto3CredentialProvider(session)
store = S3Store("bucket_name", credential_provider=credential_provider)

```

<!-- SSO authentication.
Expand Down
116 changes: 116 additions & 0 deletions docs/blog/posts/obstore-0.5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
draft: false
date: 2025-03-10
categories:
- Release
authors:
- kylebarron
links:
- CHANGELOG.md
---

# Releasing obstore 0.5!

Obstore is the simplest, highest-throughput Python interface to Amazon S3, Google Cloud Storage, and Azure Storage, powered by Rust.

This post gives an overview of what's new in obstore version 0.5.

<!-- more -->

Refer to the [changelog](../../CHANGELOG.md) for all updates.

## Credential providers

Authentication tends to be among the trickiest but most important elements of connecting to object storage. There are many ways to handle credentials, and trying to support every one natively in Obstore demands a high maintenance burden.

Instead, this release supports **custom credential providers**: Python callbacks that allow for full control over credential generation.

We'll dive into a few salient points, but make sure to read the [full authentication documentation](../../authentication.md) in the user guide.

### "Official" SDK credential providers

You can use the [`Boto3CredentialProvider`][obstore.auth.boto3.Boto3CredentialProvider] to use [`boto3.Session`][boto3.session.Session] to handle credentials.

```py
from boto3 import Session
from obstore.auth.boto3 import Boto3CredentialProvider
from obstore.store import S3Store

session = Session(...)
credential_provider = Boto3CredentialProvider(session)
store = S3Store("bucket_name", credential_provider=credential_provider)
```

### Custom credential providers

There's a long tail of possible authentication mechanisms. Obstore allows you to provide your own custom authentication callback.

You can provide either a **synchronous or asynchronous** custom authentication function.

The simplest custom credential provider can be just a function callback:

```py
from datetime import datetime, timedelta, UTC

def get_credentials() -> S3Credential:
return {
"access_key_id": "...",
"secret_access_key": "...",
# Not always required
"token": "...",
"expires_at": datetime.now(UTC) + timedelta(minutes=30),
}
```

Then just pass that function into `credential_provider`:

```py
S3Store(..., credential_provider=get_credentials)
```

More advanced credential providers, which may need to store state, can be class based. See the [authentication user guide](../../authentication.md) for more information.

### Automatic token refresh

If the credential returned by the credential provider includes an `expires_at` key, obstore will **automatically** call the credential provider to refresh your token before the expiration time.

**Your code doesn't need to think about token expiration times!**

This allows for seamlessly using something like the [AWS Security Token Service (STS)](https://docs.aws.amazon.com/STS/latest/APIReference/welcome.html), which provides temporary token credentials each hour. See [`StsCredentialProvider`][obstore.auth.boto3.StsCredentialProvider] for an example of a credential provider that uses [`STS.Client.assume_role`][] to automatically refresh tokens.

## Improved Fsspec integration

This release also significantly improves integration with the [fsspec](https://github.com/fsspec/filesystem_spec) ecosystem.

You can now [register][obstore.fsspec.register] obstore as the default handler for supported protocols, like `s3`, `gs`, and `az`. Then calling `fsspec.filesystem` or `fsspec.open` will automatically defer to [`obstore.fsspec.FsspecStore`][] and [`obstore.fsspec.BufferedFile`][], respectively.

The fsspec integration is no longer tied to a specific bucket. Instead, [`FsspecStore`][obstore.fsspec.FsspecStore] will automatically handle multiple buckets within a single protocol.

For example, obstore's fsspec integration is now tested as working with [pyarrow](../../examples/pyarrow.md).

For more information, read the [fsspec page in the user guide](../../fsspec.md).

## Improved AWS type hinting

Type hinting has been improved for AWS enums, for example AWS region. Now, when you're constructing an S3Store, if your editor supports it, you'll receive suggestions based on the type hints.

Here are two examples from vscode:

![](../../assets/aws_type_hint1.png){: style="height:300px"}
![](../../assets/aws_type_hint2.png){: style="height:200px"}

## Benchmarking

We've continued work on [benchmarking obstore](https://github.com/geospatial-jeff/pyasyncio-benchmark).

New benchmarks run on an EC2 M5 instance indicate obstore provides [2.8x higher throughput than aioboto3](https://github.com/geospatial-jeff/pyasyncio-benchmark/blob/40e67509a248c5102a6b1608bcb9773295691213/test_results/20250218_results/ec2_m5/aggregated_results.csv) when fetching the first 16KB of a file many times from an async context.

## Improved documentation

- [fsspec documentation](../../fsspec.md)
- [pyarrow integration](../../examples/pyarrow.md)
- [authentication documentation](../../authentication.md)

## All updates

Refer to the [changelog](../../CHANGELOG.md) for all updates.
52 changes: 52 additions & 0 deletions docs/examples/pyarrow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# PyArrow

[PyArrow](https://arrow.apache.org/docs/python/index.html) is the canonical Python implementation for the Apache Arrow project.

PyArrow also supports reading and writing various file formats, including Parquet, CSV, JSON, and Arrow IPC.

PyArrow integration is supported [via its fsspec integration](https://arrow.apache.org/docs/python/filesystems.html#using-fsspec-compatible-filesystems-with-arrow), since Obstore [exposes an fsspec-compatible API](../fsspec.md).

```py
import pyarrow.parquet as pq

from obstore.fsspec import FsspecStore

fs = FsspecStore("s3", skip_signature=True, region="us-west-2")

url = "s3://overturemaps-us-west-2/release/2025-02-19.0/theme=addresses/type=address/part-00010-e084a2d7-fea9-41e5-a56f-e638a3307547-c000.zstd.parquet"
parquet_file = pq.ParquetFile(url, filesystem=fs)
print(parquet_file.schema_arrow)
```
prints:
```
id: string
geometry: binary
bbox: struct<xmin: float, xmax: float, ymin: float, ymax: float> not null
child 0, xmin: float
child 1, xmax: float
child 2, ymin: float
child 3, ymax: float
country: string
postcode: string
street: string
number: string
unit: string
address_levels: list<element: struct<value: string>>
child 0, element: struct<value: string>
child 0, value: string
postal_city: string
version: int32 not null
sources: list<element: struct<property: string, dataset: string, record_id: string, update_time: string, confidence: double>>
child 0, element: struct<property: string, dataset: string, record_id: string, update_time: string, confidence: double>
child 0, property: string
child 1, dataset: string
child 2, record_id: string
child 3, update_time: string
child 4, confidence: double
-- schema metadata --
geo: '{"version":"1.1.0","primary_column":"geometry","columns":{"geometry' + 230
org.apache.spark.legacyINT96: ''
org.apache.spark.version: '3.4.1'
org.apache.spark.sql.parquet.row.metadata: '{"type":"struct","fields":[{"' + 1586
org.apache.spark.legacyDateTime: ''
```
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ nav:
- Examples:
- examples/fastapi.md
- examples/minio.md
- examples/pyarrow.md
- examples/tqdm.md
- Blog:
- blog/index.md
Expand Down
2 changes: 1 addition & 1 deletion pyo3-object_store/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ impl From<PyObjectStoreError> for PyErr {
object_store::Error::UnknownConfigurationKey { store: _, key: _ } => {
UnknownConfigurationKeyError::new_err(print_with_debug(err))
}
_ => GenericError::new_err(err.to_string()),
_ => GenericError::new_err(print_with_debug(err)),
},
PyObjectStoreError::IOError(err) => PyIOError::new_err(err),
}
Expand Down
Loading