Skip to content

Commit 7cdcbf6

Browse files
authored
Prepare 0.5 changelog (#313)
* Prepare 0.5 changelog * Write obstore 0.5 release post * Update dates to Monday * Update changelog * pyarrow example
1 parent 42a765e commit 7cdcbf6

File tree

8 files changed

+205
-3
lines changed

8 files changed

+205
-3
lines changed

CHANGELOG.md

+35-1
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,42 @@
11
# Changelog
22

3-
## [0.5.0] - unreleased
3+
## [0.5.0] - 2025-03-10
4+
5+
### New Features :magic_wand:
6+
7+
- **User-supplied credential callback** by @kylebarron in https://github.com/developmentseed/obstore/pull/234
8+
- **Fsspec updates**:
9+
- [FEAT] Create obstore store in fsspec on demand by @machichima in https://github.com/developmentseed/obstore/pull/198
10+
- [FEAT] support df.to_parquet and df.read_parquet() by @machichima in https://github.com/developmentseed/obstore/pull/165
11+
- Document fsspec integration in user guide by @kylebarron in https://github.com/developmentseed/obstore/pull/299
12+
- fsspec: Allow calling `register` with no arguments by @kylebarron in https://github.com/developmentseed/obstore/pull/298
13+
- Enable pickling Bytes by @kylebarron in https://github.com/developmentseed/obstore/pull/295
14+
- Add AWS literal type hints by @kylebarron in https://github.com/developmentseed/obstore/pull/301
15+
- pyo3-bytes slicing by @jessekrubin in https://github.com/developmentseed/obstore/pull/249
416

517
### Breaking changes :wrench:
618

719
- Removed `S3Store.from_session` and `S3Store._from_native`. Use credential providers instead.
20+
- Rename `AsyncFsspecStore` to `FsspecStore` by @kylebarron in https://github.com/developmentseed/obstore/pull/297
21+
- Reduce the config variations supported for input. I.e. we previously allowed `region`, `aws_region`, `REGION` or `AWS_REGION` as a config parameter to `S3Store`, which could make it confusing. We now only support a single config input value for each underlying concept. https://github.com/developmentseed/obstore/pull/323
22+
23+
### Bug fixes :bug:
24+
25+
- Validate input for range request by @kylebarron in https://github.com/developmentseed/obstore/pull/255
26+
27+
### Documentation :book:
28+
29+
- Update performance numbers by @kylebarron in https://github.com/developmentseed/obstore/pull/307
30+
- Document type-only constructs by @kylebarron in https://github.com/developmentseed/obstore/pull/309, https://github.com/developmentseed/obstore/pull/311
31+
- Add import warning admonition on ObjectStore type by @kylebarron in
32+
- Update etag conditional put docs by @kylebarron in https://github.com/developmentseed/obstore/pull/310
33+
34+
### New Contributors
35+
36+
- @weiji14 made their first contribution in https://github.com/developmentseed/obstore/pull/272
37+
- @machichima made their first contribution in https://github.com/developmentseed/obstore/pull/198
38+
39+
**Full Changelog**: https://github.com/developmentseed/obstore/compare/py-v0.4.0...py-v0.5.0
840

941
## [0.4.0] - 2025-02-10
1042

@@ -19,11 +51,13 @@
1951
- Enable automatic cleanup for local store, when deleting directories by @kylebarron in https://github.com/developmentseed/obstore/pull/175
2052
- Optionally create root dir in LocalStore by @kylebarron in https://github.com/developmentseed/obstore/pull/177
2153
- **File-like object** updates:
54+
2255
- Add support for writable file-like objects by @kylebarron in https://github.com/developmentseed/obstore/pull/167
2356
- Updates to readable file API:
2457

2558
- Support user-specified capacity in readable file-like objects by @kylebarron in https://github.com/developmentseed/obstore/pull/174
2659
- Expose `ObjectMeta` from readable file API by @kylebarron in https://github.com/developmentseed/obstore/pull/176
60+
2761
- Merge `config` and `kwargs` and validate that no configuration parameters have been passed multiple times. (https://github.com/developmentseed/obstore/pull/180, https://github.com/developmentseed/obstore/pull/182, https://github.com/developmentseed/obstore/pull/218)
2862
- Add `__repr__` to `Bytes` class by @jessekrubin in https://github.com/developmentseed/obstore/pull/173
2963

docs/assets/aws_type_hint1.png

84.3 KB
Loading

docs/assets/aws_type_hint2.png

30.5 KB
Loading

docs/authentication.md

-1
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,6 @@ from obstore.store import S3Store
6767
session = Session(...)
6868
credential_provider = Boto3CredentialProvider(session)
6969
store = S3Store("bucket_name", credential_provider=credential_provider)
70-
7170
```
7271

7372
<!-- SSO authentication.

docs/blog/posts/obstore-0.5.md

+116
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
---
2+
draft: false
3+
date: 2025-03-10
4+
categories:
5+
- Release
6+
authors:
7+
- kylebarron
8+
links:
9+
- CHANGELOG.md
10+
---
11+
12+
# Releasing obstore 0.5!
13+
14+
Obstore is the simplest, highest-throughput Python interface to Amazon S3, Google Cloud Storage, and Azure Storage, powered by Rust.
15+
16+
This post gives an overview of what's new in obstore version 0.5.
17+
18+
<!-- more -->
19+
20+
Refer to the [changelog](../../CHANGELOG.md) for all updates.
21+
22+
## Credential providers
23+
24+
Authentication tends to be among the trickiest but most important elements of connecting to object storage. There are many ways to handle credentials, and trying to support every one natively in Obstore demands a high maintenance burden.
25+
26+
Instead, this release supports **custom credential providers**: Python callbacks that allow for full control over credential generation.
27+
28+
We'll dive into a few salient points, but make sure to read the [full authentication documentation](../../authentication.md) in the user guide.
29+
30+
### "Official" SDK credential providers
31+
32+
You can use the [`Boto3CredentialProvider`][obstore.auth.boto3.Boto3CredentialProvider] to use [`boto3.Session`][boto3.session.Session] to handle credentials.
33+
34+
```py
35+
from boto3 import Session
36+
from obstore.auth.boto3 import Boto3CredentialProvider
37+
from obstore.store import S3Store
38+
39+
session = Session(...)
40+
credential_provider = Boto3CredentialProvider(session)
41+
store = S3Store("bucket_name", credential_provider=credential_provider)
42+
```
43+
44+
### Custom credential providers
45+
46+
There's a long tail of possible authentication mechanisms. Obstore allows you to provide your own custom authentication callback.
47+
48+
You can provide either a **synchronous or asynchronous** custom authentication function.
49+
50+
The simplest custom credential provider can be just a function callback:
51+
52+
```py
53+
from datetime import datetime, timedelta, UTC
54+
55+
def get_credentials() -> S3Credential:
56+
return {
57+
"access_key_id": "...",
58+
"secret_access_key": "...",
59+
# Not always required
60+
"token": "...",
61+
"expires_at": datetime.now(UTC) + timedelta(minutes=30),
62+
}
63+
```
64+
65+
Then just pass that function into `credential_provider`:
66+
67+
```py
68+
S3Store(..., credential_provider=get_credentials)
69+
```
70+
71+
More advanced credential providers, which may need to store state, can be class based. See the [authentication user guide](../../authentication.md) for more information.
72+
73+
### Automatic token refresh
74+
75+
If the credential returned by the credential provider includes an `expires_at` key, obstore will **automatically** call the credential provider to refresh your token before the expiration time.
76+
77+
**Your code doesn't need to think about token expiration times!**
78+
79+
This allows for seamlessly using something like the [AWS Security Token Service (STS)](https://docs.aws.amazon.com/STS/latest/APIReference/welcome.html), which provides temporary token credentials each hour. See [`StsCredentialProvider`][obstore.auth.boto3.StsCredentialProvider] for an example of a credential provider that uses [`STS.Client.assume_role`][] to automatically refresh tokens.
80+
81+
## Improved Fsspec integration
82+
83+
This release also significantly improves integration with the [fsspec](https://github.com/fsspec/filesystem_spec) ecosystem.
84+
85+
You can now [register][obstore.fsspec.register] obstore as the default handler for supported protocols, like `s3`, `gs`, and `az`. Then calling `fsspec.filesystem` or `fsspec.open` will automatically defer to [`obstore.fsspec.FsspecStore`][] and [`obstore.fsspec.BufferedFile`][], respectively.
86+
87+
The fsspec integration is no longer tied to a specific bucket. Instead, [`FsspecStore`][obstore.fsspec.FsspecStore] will automatically handle multiple buckets within a single protocol.
88+
89+
For example, obstore's fsspec integration is now tested as working with [pyarrow](../../examples/pyarrow.md).
90+
91+
For more information, read the [fsspec page in the user guide](../../fsspec.md).
92+
93+
## Improved AWS type hinting
94+
95+
Type hinting has been improved for AWS enums, for example AWS region. Now, when you're constructing an S3Store, if your editor supports it, you'll receive suggestions based on the type hints.
96+
97+
Here are two examples from vscode:
98+
99+
![](../../assets/aws_type_hint1.png){: style="height:300px"}
100+
![](../../assets/aws_type_hint2.png){: style="height:200px"}
101+
102+
## Benchmarking
103+
104+
We've continued work on [benchmarking obstore](https://github.com/geospatial-jeff/pyasyncio-benchmark).
105+
106+
New benchmarks run on an EC2 M5 instance indicate obstore provides [2.8x higher throughput than aioboto3](https://github.com/geospatial-jeff/pyasyncio-benchmark/blob/40e67509a248c5102a6b1608bcb9773295691213/test_results/20250218_results/ec2_m5/aggregated_results.csv) when fetching the first 16KB of a file many times from an async context.
107+
108+
## Improved documentation
109+
110+
- [fsspec documentation](../../fsspec.md)
111+
- [pyarrow integration](../../examples/pyarrow.md)
112+
- [authentication documentation](../../authentication.md)
113+
114+
## All updates
115+
116+
Refer to the [changelog](../../CHANGELOG.md) for all updates.

docs/examples/pyarrow.md

+52
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# PyArrow
2+
3+
[PyArrow](https://arrow.apache.org/docs/python/index.html) is the canonical Python implementation for the Apache Arrow project.
4+
5+
PyArrow also supports reading and writing various file formats, including Parquet, CSV, JSON, and Arrow IPC.
6+
7+
PyArrow integration is supported [via its fsspec integration](https://arrow.apache.org/docs/python/filesystems.html#using-fsspec-compatible-filesystems-with-arrow), since Obstore [exposes an fsspec-compatible API](../fsspec.md).
8+
9+
```py
10+
import pyarrow.parquet as pq
11+
12+
from obstore.fsspec import FsspecStore
13+
14+
fs = FsspecStore("s3", skip_signature=True, region="us-west-2")
15+
16+
url = "s3://overturemaps-us-west-2/release/2025-02-19.0/theme=addresses/type=address/part-00010-e084a2d7-fea9-41e5-a56f-e638a3307547-c000.zstd.parquet"
17+
parquet_file = pq.ParquetFile(url, filesystem=fs)
18+
print(parquet_file.schema_arrow)
19+
```
20+
prints:
21+
```
22+
id: string
23+
geometry: binary
24+
bbox: struct<xmin: float, xmax: float, ymin: float, ymax: float> not null
25+
child 0, xmin: float
26+
child 1, xmax: float
27+
child 2, ymin: float
28+
child 3, ymax: float
29+
country: string
30+
postcode: string
31+
street: string
32+
number: string
33+
unit: string
34+
address_levels: list<element: struct<value: string>>
35+
child 0, element: struct<value: string>
36+
child 0, value: string
37+
postal_city: string
38+
version: int32 not null
39+
sources: list<element: struct<property: string, dataset: string, record_id: string, update_time: string, confidence: double>>
40+
child 0, element: struct<property: string, dataset: string, record_id: string, update_time: string, confidence: double>
41+
child 0, property: string
42+
child 1, dataset: string
43+
child 2, record_id: string
44+
child 3, update_time: string
45+
child 4, confidence: double
46+
-- schema metadata --
47+
geo: '{"version":"1.1.0","primary_column":"geometry","columns":{"geometry' + 230
48+
org.apache.spark.legacyINT96: ''
49+
org.apache.spark.version: '3.4.1'
50+
org.apache.spark.sql.parquet.row.metadata: '{"type":"struct","fields":[{"' + 1586
51+
org.apache.spark.legacyDateTime: ''
52+
```

mkdocs.yml

+1
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ nav:
3535
- Examples:
3636
- examples/fastapi.md
3737
- examples/minio.md
38+
- examples/pyarrow.md
3839
- examples/tqdm.md
3940
- Blog:
4041
- blog/index.md

pyo3-object_store/src/error.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ impl From<PyObjectStoreError> for PyErr {
143143
object_store::Error::UnknownConfigurationKey { store: _, key: _ } => {
144144
UnknownConfigurationKeyError::new_err(print_with_debug(err))
145145
}
146-
_ => GenericError::new_err(err.to_string()),
146+
_ => GenericError::new_err(print_with_debug(err)),
147147
},
148148
PyObjectStoreError::IOError(err) => PyIOError::new_err(err),
149149
}

0 commit comments

Comments
 (0)