Skip to content

Commit 234cc8d

Browse files
authored
Docs updates for 0.4 release (#250)
* Set up blog plugin * Add changelog * Add alternatives doc * Add note to perf doc * wip release post * Finish release notes * one store per bucket note * turn off draft
1 parent 51f313b commit 234cc8d

13 files changed

+315
-18
lines changed

Diff for: CHANGELOG.md

+53-3
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,60 @@
11
# Changelog
22

3-
## [0.4.0] -
3+
## [0.4.0] - 2025-02-10
4+
5+
### New Features :magic_wand:
6+
7+
- **Support for pickling** & always manage store prefix by @kylebarron in https://github.com/developmentseed/obstore/pull/185, https://github.com/developmentseed/obstore/pull/239, https://github.com/developmentseed/obstore/pull/223
8+
- **Add top-level `obstore.store.from_url` function**, which delegates to each store's `from_url` constructor by @kylebarron in https://github.com/developmentseed/obstore/pull/179, https://github.com/developmentseed/obstore/pull/201
9+
- Add option to return Arrow from `list_with_delimiter` by @kylebarron in https://github.com/developmentseed/obstore/pull/238, https://github.com/developmentseed/obstore/pull/244
10+
- (Provisional) **Enhanced loading of s3 credentials** using `aws-config` crate by @kylebarron in https://github.com/developmentseed/obstore/pull/203
11+
- **Access config values out from stores** by @kylebarron in https://github.com/developmentseed/obstore/pull/210
12+
- LocalStore updates:
13+
- Enable automatic cleanup for local store, when deleting directories by @kylebarron in https://github.com/developmentseed/obstore/pull/175
14+
- Optionally create root dir in LocalStore by @kylebarron in https://github.com/developmentseed/obstore/pull/177
15+
- **File-like object** updates:
16+
- Add support for writable file-like objects by @kylebarron in https://github.com/developmentseed/obstore/pull/167
17+
- Updates to readable file API:
18+
19+
- Support user-specified capacity in readable file-like objects by @kylebarron in https://github.com/developmentseed/obstore/pull/174
20+
- Expose `ObjectMeta` from readable file API by @kylebarron in https://github.com/developmentseed/obstore/pull/176
21+
- Merge `config` and `kwargs` and validate that no configuration parameters have been passed multiple times. (https://github.com/developmentseed/obstore/pull/180, https://github.com/developmentseed/obstore/pull/182, https://github.com/developmentseed/obstore/pull/218)
22+
- Add `__repr__` to `Bytes` class by @jessekrubin in https://github.com/developmentseed/obstore/pull/173
423

524
### Breaking changes :wrench:
625

726
- `get_range`, `get_range_async`, `get_ranges`, and `get_ranges_async` now require named parameters for `start`, `end`, and `length` to make the semantics of the range request fully explicit. by @kylebarron in https://github.com/developmentseed/obstore/pull/156
27+
- Previously, individual stores did not manage a prefix path within the remote resource and [`PrefixStore`](https://developmentseed.org/obstore/v0.3.0/api/store/middleware/#obstore.store.PrefixStore) was used to enable this. As of 0.4.0, `PrefixStore` was removed and all stores manage an optional mount prefix natively.
28+
- `obstore.open` has been renamed to `obstore.open_reader`.
29+
- The `from_env` constructor has been removed from `S3Store`, `GCSStore`, and `AzureStore`. Now all constructors will read from environment variables. Use `__init__` or `from_url` instead. https://github.com/developmentseed/obstore/pull/189
30+
- `obstore.exceptions.ObstoreError` renamed to `obstore.exceptions.BaseError` https://github.com/developmentseed/obstore/pull/200
31+
32+
### Bug fixes :bug:
33+
34+
- Fix pylance finding exceptions module by @kylebarron in https://github.com/developmentseed/obstore/pull/183
35+
- Allow passing in partial retry/backoff config by @kylebarron in https://github.com/developmentseed/obstore/pull/205
36+
- Fix returning None from async functions by @kylebarron in https://github.com/developmentseed/obstore/pull/245
37+
- Fix LocalStore range request past end of file, by @kylebarron in https://github.com/developmentseed/obstore/pull/230
38+
39+
### Documentation :book:
40+
41+
- Update wording for fsspec docstring by @kylebarron in https://github.com/developmentseed/obstore/pull/195
42+
- Add documentation about AWS region by @kylebarron in https://github.com/developmentseed/obstore/pull/213
43+
- Add developer documentation for functional API choice by @kylebarron in https://github.com/developmentseed/obstore/pull/215
44+
- Add `tqdm` progress bar example by @kylebarron in https://github.com/developmentseed/obstore/pull/237
45+
- Add contributor, performance, integrations docs by @kylebarron in https://github.com/developmentseed/obstore/pull/227
46+
- Add minio example by @kylebarron in https://github.com/developmentseed/obstore/pull/241
47+
48+
### Other
49+
50+
- Use manylinux 2_24 for aarch64 linux wheels by @kylebarron in https://github.com/developmentseed/obstore/pull/225
51+
52+
### New Contributors
53+
54+
- @vincentsarago made their first contribution in https://github.com/developmentseed/obstore/pull/168
55+
- @jessekrubin made their first contribution in https://github.com/developmentseed/obstore/pull/173
56+
57+
**Full Changelog**: https://github.com/developmentseed/obstore/compare/py-v0.3.0...py-v0.4.0
858

959
## [0.3.0] - 2025-01-16
1060

@@ -39,12 +89,12 @@
3989
- Add note that S3Store can be constructed without boto3 by @kylebarron in https://github.com/developmentseed/obstore/pull/108
4090
- HTTP Store usage example by @kylebarron in https://github.com/developmentseed/obstore/pull/142
4191

42-
## What's Changed
92+
### What's Changed
4393

4494
- Improved docs for from_url by @kylebarron in https://github.com/developmentseed/obstore/pull/138
4595
- Implement read_all for async iterable by @kylebarron in https://github.com/developmentseed/obstore/pull/140
4696

47-
## New Contributors
97+
### New Contributors
4898

4999
- @willemarcel made their first contribution in https://github.com/developmentseed/obstore/pull/64
50100
- @martindurant made their first contribution in https://github.com/developmentseed/obstore/pull/63

Diff for: README.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,11 @@
1111
[pypi-img]: https://img.shields.io/pypi/dm/obstore
1212
[pypi-link]: https://pypi.org/project/obstore/
1313

14-
The simplest, highest-throughput [^1] interface to Amazon S3, Google Cloud Storage, Azure Blob Storage, and S3-compliant APIs like Cloudflare R2.
14+
The simplest, highest-throughput [^1] Python interface to [S3][s3], [GCS][gcs], [Azure Storage][azure_storage], & other S3-compliant APIs, powered by Rust.
15+
16+
[s3]: https://aws.amazon.com/s3/
17+
[gcs]: https://cloud.google.com/storage
18+
[azure_storage]: https://learn.microsoft.com/en-us/azure/storage/common/storage-introduction
1519

1620
- Sync and async API with **full type hinting**.
1721
- **Streaming downloads** with configurable chunking.

Diff for: docs/alternatives.md

+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Alternatives to Obstore
2+
3+
## Obstore vs fsspec
4+
5+
[Fsspec](https://github.com/fsspec/filesystem_spec) is a generic specification for pythonic filesystems. It includes implementations for several cloud storage providers, including [s3fs](https://github.com/fsspec/s3fs) for Amazon S3, [gcsfs](https://github.com/fsspec/gcsfs) for Google Cloud Storage, and [adlfs](https://github.com/fsspec/adlfs) for Azure Storage.
6+
7+
### API Differences
8+
9+
Like Obstore, fsspec presents an abstraction layer that allows you to write code once to interface to multiple cloud providers. However, the abstracted API each presents is different. Obstore tries to mirror **native object store** APIs while fsspec tries to mirror a **file-like** API.
10+
11+
The upstream Rust library powering obstore, [`object_store`](https://docs.rs/object_store), documents why [it intentionally avoids](https://docs.rs/object_store/latest/object_store/index.html#why-not-a-filesystem-interface) a primary file-like API:
12+
13+
> The `ObjectStore` interface is designed to mirror the APIs of object stores and not filesystems, and thus has stateless APIs instead of cursor based interfaces such as `Read` or `Seek` available in filesystems.
14+
>
15+
> This design provides the following advantages:
16+
>
17+
> - All operations are atomic, and readers cannot observe partial and/or failed writes
18+
> - Methods map directly to object store APIs, providing both efficiency and predictability
19+
> - Abstracts away filesystem and operating system specific quirks, ensuring portability
20+
> - Allows for functionality not native to filesystems, such as operation preconditions and atomic multipart uploads
21+
22+
Obstore's primary APIs, like [`get`][obstore.get], [`put`][obstore.put], and [`list`][obstore.list], mirror such object store APIs. However, if you still need to use a file-like API, Obstore provides such APIs with [`open_reader`][obstore.open_reader] and [`open_writer`][obstore.open_writer].
23+
24+
Obstore also includes a best-effort [fsspec compatibility layer][obstore.fsspec], which allows you to use obstore in applications that expect an fsspec-compatible API.
25+
26+
### Performance
27+
28+
Beyond API design, performance can also be a consideration. [Initial benchmarks](./performance.md) show that obstore's async API can provide 9x higher throughput than fsspec's async API.
29+
30+
## Obstore vs boto3
31+
32+
[boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) is the official Python client for working with AWS services, including S3.
33+
34+
boto3 supports all features of S3, including some features that obstore doesn't provide, like creating or deleting buckets.
35+
36+
However, boto3 is synchronous and specific to AWS. To support multiple clouds you'd need to use boto3 and another library and abstract away those differences yourself. With obstore you can interface with data in multiple clouds, changing only configuration settings.
37+
38+
## Obstore vs aioboto3
39+
40+
[aioboto3](https://github.com/terricain/aioboto3) is an async Python client for S3, wrapping boto3 and [aiobotocore](https://github.com/aio-libs/aiobotocore).
41+
42+
aioboto3 presents largely the same API as boto3, but async. As above, this means that it may support more S3-specific features than what obstore supports.
43+
44+
But it's still specific to AWS, and in early [benchmarks](./performance.md) we've measured obstore to provide significantly higher throughput than aioboto3.
45+
46+
## Obstore vs Google Cloud Storage Python Client
47+
48+
The official [Google Cloud Storage Python client](https://cloud.google.com/python/docs/reference/storage/latest) [uses requests](https://github.com/googleapis/python-storage/blob/f2cc9c5a2b1cc9724ca1269b8d452304da96bf03/setup.py#L42) as its HTTP client. This means that the GCS Python client supports only synchronous requests.
49+
50+
It also presents a Google-specific API, so you'd need to re-implement your code if you want to use multiple cloud providers.

Diff for: docs/assets/sentinel2-grca-thumbnail-obstore-04.jpg

111 KB
Loading

Diff for: docs/blog/.authors.yml

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
authors:
2+
kylebarron:
3+
name: Kyle Barron
4+
description: Creator
5+
avatar: https://github.com/kylebarron.png

Diff for: docs/blog/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Blog

Diff for: docs/blog/posts/obstore-0.4.md

+150
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
---
2+
draft: false
3+
date: 2025-02-10
4+
categories:
5+
- Release
6+
authors:
7+
- kylebarron
8+
links:
9+
- CHANGELOG.md
10+
---
11+
12+
# Releasing obstore 0.4!
13+
14+
Obstore is the simplest, highest-throughput Python interface to Amazon S3, Google Cloud Storage, and Azure Storage, powered by Rust.
15+
16+
This post gives an overview of what's new in obstore version 0.4.
17+
18+
<!-- more -->
19+
20+
Refer to the [changelog](../../CHANGELOG.md#040-2025-02-10) for all updates.
21+
22+
## Easier store creation with `from_url`
23+
24+
There's a new top-level [`obstore.store.from_url`][] function, which makes it dead-simple to create a store from a URL.
25+
26+
Here's an example of using it to inspect data from the [Sentinel-2 open data bucket](https://registry.opendata.aws/sentinel-2-l2a-cogs/). `from_url` automatically infers that this is an S3 path and constructs an [`S3Store`][obstore.store.S3Store], which we can pass to [`obstore.list_with_delimiter`][] and [`obstore.get`][].
27+
28+
```py
29+
import obstore as obs
30+
from obstore.store import from_url
31+
32+
# The base path within the bucket to "mount" to
33+
url = "s3://sentinel-cogs/sentinel-s2-l2a-cogs/12/S/UF/2022/6/S2A_12SUF_20220601_0_L2A"
34+
35+
# Pass in store-specific parameters as keyword arguments
36+
# Here, we pass `skip_signature=True` because it's a public bucket
37+
store = from_url(url, region="us-west-2", skip_signature=True)
38+
39+
# Print filenames in this directory
40+
print([meta["path"] for meta in obs.list_with_delimiter(store)["objects"]])
41+
# ['AOT.tif', 'B01.tif', 'B02.tif', 'B03.tif', 'B04.tif', 'B05.tif', 'B06.tif', 'B07.tif', 'B08.tif', 'B09.tif', 'B11.tif', 'B12.tif', 'B8A.tif', 'L2A_PVI.tif', 'S2A_12SUF_20220601_0_L2A.json', 'SCL.tif', 'TCI.tif', 'WVP.tif', 'granule_metadata.xml', 'thumbnail.jpg', 'tileinfo_metadata.json']
42+
43+
# Download thumbnail
44+
with open("thumbnail.jpg", "wb") as f:
45+
f.write(obs.get(store, "thumbnail.jpg").bytes())
46+
```
47+
48+
And voilà, we have a thumbnail of the Grand Canyon from space:
49+
50+
![](../../assets/sentinel2-grca-thumbnail-obstore-04.jpg)
51+
52+
`from_url` also supports typing overloads. So your type checker will raise an error if you try to mix AWS-specific and Azure-specific configuration.
53+
54+
Nevertheless, for best typing support, we still suggest using one of the store-specific `from_url` constructors (such as [`S3Store.from_url`][obstore.store.from_url]) if you know the protocol. Then your type checker can infer the type of the returned store.
55+
56+
57+
## Pickle support
58+
59+
One of obstore's initial integration targets is [zarr-python](https://github.com/zarr-developers/zarr-python), which needs to load large chunked N-dimensional arrays from object storage. In our [early benchmarking](https://github.com/maxrjones/zarr-obstore-performance), we've found that the [obstore-based backend](https://github.com/zarr-developers/zarr-python/pull/1661) can cut data loading times in half as compared to the standard fsspec-based backend.
60+
61+
However, Zarr is commonly used in distributed execution environments like [Dask](https://www.dask.org/), which needs to be able to move store instances between workers. We've implemented [pickle](https://docs.python.org/3/library/pickle.html) support for store classes to unblock this use case. Read [our pickle documentation](../../advanced/pickle.md) for more info.
62+
63+
## Enhanced loading of AWS credentials (provisional)
64+
65+
By default, each store class expects to find credential information either in environment variables or in passed-in arguments. In the case of AWS, that means the default constructors will not look in file-based credentials sources.
66+
67+
The provisional [`S3Store._from_native`][obstore.store.S3Store._from_native] constructor uses the [official AWS Rust configuration crate](https://docs.rs/aws-config/latest/aws_config/) to find credentials on the file system. This integration is expected to also automatically refresh temporary credentials before expiration.
68+
69+
This API is provisional and may change in the future. If you have any feedback, please [open an issue](https://github.com/developmentseed/obstore/issues/new/choose).
70+
71+
Obstore version 0.5 is expected to improve on extensible credentials by enabling users to pass in arbitrary credentials in a sync or async function callback.
72+
73+
## Return Arrow data from `list_with_delimiter`
74+
75+
By default, the [`obstore.list`][] and [`obstore.list_with_delimiter`][] APIs [return standard Python `dict`s][obstore.ObjectMeta]. However, if you're listing a large bucket, the overhead of materializing all those Python objects can become significant.
76+
77+
[`obstore.list`][] and [`obstore.list_with_delimiter`][] now both support a `return_arrow` keyword parameter. If set to `True`, an Arrow [`RecordBatch`][arro3.core.RecordBatch] or [`Table`][arro3.core.Table] will be returned, which is both faster and more memory efficient.
78+
79+
## Access configuration values back from a store
80+
81+
There are new attributes, such as [`config`][obstore.store.S3Store.config], [`client_options`][obstore.store.S3Store.client_options], and [`retry_config`][obstore.store.S3Store.retry_config] for accessing configuration parameters _back_ from a store instance.
82+
83+
This example uses an [`S3Store`][obstore.store.S3Store] but the same behavior applies to [`GCSStore`][obstore.store.GCSStore] and [`AzureStore`][obstore.store.AzureStore] as well.
84+
85+
```py
86+
from obstore.store import S3Store
87+
88+
store = S3Store.from_url(
89+
"s3://ookla-open-data/parquet/performance/type=fixed/year=2024/quarter=1",
90+
region="us-west-2",
91+
skip_signature=True,
92+
)
93+
new_store = S3Store(
94+
config=store.config,
95+
prefix=store.prefix,
96+
client_options=store.client_options,
97+
retry_config=store.retry_config,
98+
)
99+
assert store.config == new_store.config
100+
assert store.prefix == new_store.prefix
101+
assert store.client_options == new_store.client_options
102+
assert store.retry_config == new_store.retry_config
103+
```
104+
105+
## Open remote objects as file-like readers or writers
106+
107+
This version adds support for opening remote objects as a [file-like](../../api/file.md) reader or writer.
108+
109+
```py
110+
import os
111+
112+
import obstore as obs
113+
from obstore.store import MemoryStore
114+
115+
# Create an in-memory store
116+
store = MemoryStore()
117+
118+
# Iteratively write to the file
119+
with obs.open_writer(store, "new_file.csv") as writer:
120+
writer.write(b"col1,col2,col3\n")
121+
writer.write(b"a,1,True\n")
122+
writer.write(b"b,2,False\n")
123+
writer.write(b"c,3,True\n")
124+
125+
126+
# Open a reader from the file
127+
reader = obs.open_reader(store, "new_file.csv")
128+
file_length = reader.seek(0, os.SEEK_END)
129+
print(file_length) # 43
130+
reader.seek(0)
131+
buf = reader.read()
132+
print(buf)
133+
# Bytes(b"col1,col2,col3\na,1,True\nb,2,False\nc,3,True\n")
134+
```
135+
136+
See [`obstore.open_reader`][] and [`obstore.open_writer`][] for more details. An async file-like reader and writer is also provided, see [`obstore.open_reader_async`][] and [`obstore.open_writer_async`][].
137+
138+
## Benchmarking
139+
140+
[Benchmarking is still ongoing](https://github.com/geospatial-jeff/pyasyncio-benchmark), but early results have been very promising and we've [added documentation about our progress so far](../../performance.md).
141+
142+
## New examples
143+
144+
We've worked to update the documentation with more examples! We now have examples for how to use obstore with [FastAPI](../../examples/fastapi.md), [MinIO](../../examples/minio.md), and [tqdm](../../examples/tqdm.md).
145+
146+
We've also worked to consolidate introductory documentation into the ["user guide"](../../getting-started.md).
147+
148+
## All updates
149+
150+
Refer to the [changelog](../../CHANGELOG.md#040-2025-02-10) for all updates.

Diff for: docs/getting-started.md

+2
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ Alternatively, you can construct a store directly:
2020

2121
Each store concept has a variety of constructors, and a host of configuration options.
2222

23+
Note that each store is scoped to **one bucket**, so you'll have to create a separate store instance per bucket, even if they're in the same region.
24+
2325
**Example:**
2426

2527
For example, multiple ways to create an anonymous `S3Store` client (without any credentials, for use with fully public buckets):

Diff for: docs/overrides/main.html

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{% extends "base.html" %}
2+
3+
{% block content %}
4+
{% if page.nb_url %}
5+
<a href="{{ page.nb_url }}" title="Download Notebook" class="md-content__button md-icon">
6+
{% include ".icons/material/download.svg" %}
7+
</a>
8+
{% endif %}
9+
10+
{{ super() }}
11+
{% endblock content %}
12+
13+
{% block outdated %}
14+
You're not viewing the latest version.
15+
<a href="{{ '../' ~ base_url }}">
16+
<strong>Click here to go to latest.</strong>
17+
</a>
18+
{% endblock %}

Diff for: docs/performance.md

+9
Original file line numberDiff line numberDiff line change
@@ -46,3 +46,12 @@ For example, [preliminary results](https://github.com/geospatial-jeff/pyasyncio-
4646

4747
Keep in mind, however, that what looks like a single request may actually be multiple requests under the hood. [`obstore.put`][obstore.put] will use multipart uploads by default, meaning that various parts of a file will be uploaded concurrently, and there may be efficiency gains here.
4848
- Latency: this is primarily driven by hardware and network conditions, and we expect Obstore to have similar latency as other Python request libraries.
49+
50+
## Future research
51+
52+
In the future, we'd like to benchmark:
53+
54+
- Alternate Python event loops, e.g. [`uvloop`](https://github.com/MagicStack/uvloop)
55+
- The obstore synchronous API
56+
57+
If you have any interest in collaborating on this, [open an issue](https://github.com/developmentseed/obstore/issues/new/choose).

0 commit comments

Comments
 (0)