Skip to content

Commit b6510d7

Browse files
authored
Document fsspec integration in user guide (#299)
1 parent 94b69cc commit b6510d7

File tree

5 files changed

+164
-20
lines changed

5 files changed

+164
-20
lines changed

README.md

+2-3
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,10 @@ The simplest, highest-throughput [^1] Python interface to [S3][s3], [GCS][gcs],
2222
- **Streaming uploads** from files or async or sync iterators.
2323
- **Streaming list**, with no need to paginate.
2424
- Automatic [**multipart uploads**](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html) for large file objects.
25+
- File-like object API and [fsspec](https://github.com/fsspec/filesystem_spec) integration.
26+
- Easy to install with **no required Python dependencies**.
2527
- Support for **conditional put** ("put if not exists"), as well as custom tags and attributes.
2628
- Optionally return list results in [Apache Arrow](https://arrow.apache.org/) format, which is faster and more memory-efficient than materializing Python `dict`s.
27-
- File-like object API and [fsspec](https://github.com/fsspec/filesystem_spec) integration.
28-
- Easy to install with no required Python dependencies.
29-
- The [underlying Rust library](https://docs.rs/object_store) is production quality and used in large scale production systems, such as the Rust package registry [crates.io](https://crates.io/).
3029
- Zero-copy data exchange between Rust and Python via the [buffer protocol](https://jakevdp.github.io/blog/2014/05/05/introduction-to-the-python-buffer-protocol/).
3130

3231
<!-- For Rust developers looking to add object_store support to their Python packages, refer to pyo3-object_store. -->

docs/assets/fsspec-type-hinting.jpg

84.4 KB
Loading

docs/fsspec.md

+132
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# fsspec Integration
2+
3+
Obstore provides native integration with the [fsspec] ecosystem.
4+
5+
[fsspec]: https://github.com/fsspec/filesystem_spec
6+
7+
The fsspec integration is best effort and may not provide the same
8+
performance as the rest of obstore. Where possible, implementations should use
9+
the underlying `obstore` APIs directly. If you find any bugs with this
10+
integration, please [file an
11+
issue](https://github.com/developmentseed/obstore/issues/new/choose).
12+
13+
## Usage
14+
15+
### Direct class usage
16+
17+
Construct an fsspec-compatible filesystem with [`FsspecStore`][obstore.fsspec.FsspecStore]. This implements [`AbstractFileSystem`][fsspec.spec.AbstractFileSystem], so you can use it wherever an API expects an fsspec-compatible filesystem.
18+
19+
```py
20+
from obstore.fsspec import FsspecStore
21+
22+
fs = FsspecStore("s3", region="us-west-2", skip_signature=True)
23+
prefix = (
24+
"s3://sentinel-cogs/sentinel-s2-l2a-cogs/12/S/UF/2022/6/S2B_12SUF_20220609_0_L2A/"
25+
)
26+
items = fs.ls(prefix)
27+
# [{'name': 'sentinel-cogs/sentinel-s2-l2a-cogs/12/S/UF/2022/6/S2B_12SUF_20220609_0_L2A/AOT.tif',
28+
# 'size': 80689,
29+
# 'type': 'file',
30+
# 'e_tag': '"c93b0f6b0e2cf8e375968f41161f9df7"'},
31+
# ...
32+
```
33+
34+
If you need a readable or writable file-like object, you can call the `open`
35+
method provided on `FsspecStore`, or you may construct a
36+
[`BufferedFile`][obstore.fsspec.BufferedFile] directly.
37+
38+
```py
39+
from obstore.fsspec import FsspecStore
40+
41+
fs = FsspecStore("s3", region="us-west-2", skip_signature=True)
42+
43+
with fs.open(
44+
"s3://sentinel-cogs/sentinel-s2-l2a-cogs/12/S/UF/2022/6/S2B_12SUF_20220609_0_L2A/thumbnail.jpg",
45+
) as file:
46+
content = file.read()
47+
```
48+
49+
Using the `FsspecStore` class directly may be preferred because the type hinting should work automatically, which may help IDEs like VSCode suggest valid keyword parameters.
50+
51+
### Register as a global handler
52+
53+
Use [`register`][obstore.fsspec.register] to register obstore as the default
54+
handler for various protocols. Then use [`fsspec.filesystem`][] to create an
55+
fsspec filesystem object for a specific protocol. Or use [`fsspec.open`][] to
56+
open a file given a URL.
57+
58+
```py
59+
import fsspec
60+
from obstore.fsspec import register
61+
62+
# Register obstore as the default handler for all protocols supported by
63+
# obstore.
64+
# You may wish to register only specific protocols, instead.
65+
register()
66+
67+
# Create a new fsspec filesystem for the given protocol
68+
fs = fsspec.filesystem("https")
69+
content = fs.cat_file("https://example.com/")
70+
71+
# Or, open the file directly
72+
```py
73+
url = "https://github.com/opengeospatial/geoparquet/raw/refs/heads/main/examples/example.parquet"
74+
with fsspec.open(url) as file:
75+
content = file.read()
76+
```
77+
78+
## Store configuration
79+
80+
Some stores may require configuration. You may pass configuration parameters to the [`FsspecStore`][obstore.fsspec.FsspecStore] constructor directly. Or, if you're using [`fsspec.filesystem`][], you may pass configuration parameters to that call, which will pass parameters down to the `FsspecStore` constructor internally.
81+
82+
```py
83+
from obstore.fsspec import FsspecStore
84+
85+
fs = FsspecStore("s3", region="us-west-2", skip_signature=True)
86+
```
87+
88+
Or, with [`fsspec.filesystem`][]:
89+
90+
```py
91+
import fsspec
92+
93+
from obstore.fsspec import register
94+
95+
register("s3")
96+
97+
fs = fsspec.filesystem("s3", region="us-west-2", skip_signature=True)
98+
```
99+
100+
## Type hinting
101+
102+
The fsspec API is not conducive to type checking. The easiest way to get type hinting for parameters is to use [`FsspecStore`][obstore.fsspec.FsspecStore] to construct fsspec-compatible stores instead of [`fsspec.filesystem`][].
103+
104+
[`fsspec.open`][] and [`fsspec.filesystem`][] take arbitrary keyword arguments that they pass down to the underlying store, and these pass-through arguments are not typed.
105+
106+
However, it is possible to get type checking of store configuration by defining config parameters as a dictionary:
107+
108+
```py
109+
from __future__ import annotations
110+
111+
from typing import TYPE_CHECKING
112+
113+
import fsspec
114+
115+
from obstore.fsspec import register
116+
117+
if TYPE_CHECKING:
118+
from obstore.store import S3ConfigInput
119+
120+
register("s3")
121+
122+
config: S3ConfigInput = {"region": "us-west-2", "skip_signature": True}
123+
fs = fsspec.filesystem("s3", config=config)
124+
```
125+
126+
Then your type checker will validate that the `config` dictionary is compatible with [`S3ConfigInput`][obstore.store.S3ConfigInput]. VSCode also provides auto suggestions for parameters:
127+
128+
![](./assets/fsspec-type-hinting.jpg)
129+
130+
!!! note
131+
132+
`S3ConfigInput` is a "type-only" construct, and so it needs to be imported from within an `if TYPE_CHECKING` block. Additionally, `from __future__ import annotations` must be at the top of the file.

mkdocs.yml

+1
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ nav:
2828
- authentication.md
2929
- integrations.md
3030
- performance.md
31+
- fsspec.md
3132
- Alternatives: alternatives.md
3233
- Troubleshooting:
3334
- AWS: troubleshooting/aws.md

obstore/python/obstore/fsspec.py

+29-17
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,8 @@
22
33
[fsspec]: https://github.com/fsspec/filesystem_spec
44
5-
The fsspec integration is **best effort** and not the primary API of `obstore`. This
6-
integration may not be as stable and may not provide the same performance as the rest of
7-
the library. Changes may be made even in patch releases to align better with fsspec
8-
expectations. If you find any bugs, please [file an
5+
The fsspec integration is best effort and may not provide the same performance as
6+
the rest of obstore. If you find any bugs with this integration, please [file an
97
issue](https://github.com/developmentseed/obstore/issues/new/choose).
108
119
The underlying `object_store` Rust crate
@@ -39,7 +37,7 @@
3937
from collections import defaultdict
4038
from functools import lru_cache
4139
from pathlib import Path
42-
from typing import TYPE_CHECKING, Any, Literal, overload
40+
from typing import TYPE_CHECKING, Any, Literal, Unpack, overload
4341
from urllib.parse import urlparse
4442

4543
import fsspec.asyn
@@ -65,6 +63,12 @@
6563
S3ConfigInput,
6664
)
6765

66+
__all__ = [
67+
"BufferedFile",
68+
"FsspecStore",
69+
"register",
70+
]
71+
6872
SUPPORTED_PROTOCOLS: set[str] = {
6973
"abfs",
7074
"abfss",
@@ -113,46 +117,49 @@ class FsspecStore(fsspec.asyn.AsyncFileSystem):
113117
@overload
114118
def __init__(
115119
self,
116-
*args: Any,
117120
protocol: Literal["s3", "s3a"],
121+
*args: Any,
118122
config: S3Config | S3ConfigInput | None = None,
119123
client_options: ClientConfig | None = None,
120124
retry_config: RetryConfig | None = None,
121125
asynchronous: bool = False,
122126
max_cache_size: int = 10,
123127
loop: Any = None,
124128
batch_size: int | None = None,
129+
**kwargs: Unpack[S3ConfigInput],
125130
) -> None: ...
126131
@overload
127132
def __init__(
128133
self,
129-
*args: Any,
130134
protocol: Literal["gs"],
135+
*args: Any,
131136
config: GCSConfig | GCSConfigInput | None = None,
132137
client_options: ClientConfig | None = None,
133138
retry_config: RetryConfig | None = None,
134139
asynchronous: bool = False,
135140
max_cache_size: int = 10,
136141
loop: Any = None,
137142
batch_size: int | None = None,
143+
**kwargs: Unpack[GCSConfigInput],
138144
) -> None: ...
139145
@overload
140146
def __init__(
141147
self,
142-
*args: Any,
143148
protocol: Literal["az", "adl", "azure", "abfs", "abfss"],
149+
*args: Any,
144150
config: AzureConfig | AzureConfigInput | None = None,
145151
client_options: ClientConfig | None = None,
146152
retry_config: RetryConfig | None = None,
147153
asynchronous: bool = False,
148154
max_cache_size: int = 10,
149155
loop: Any = None,
150156
batch_size: int | None = None,
157+
**kwargs: Unpack[AzureConfigInput],
151158
) -> None: ...
152159
def __init__( # noqa: PLR0913
153160
self,
161+
protocol: SUPPORTED_PROTOCOLS_T | str | None = None,
154162
*args: Any,
155-
protocol: str | None = None,
156163
config: (
157164
S3Config
158165
| S3ConfigInput
@@ -168,6 +175,7 @@ def __init__( # noqa: PLR0913
168175
max_cache_size: int = 10,
169176
loop: Any = None,
170177
batch_size: int | None = None,
178+
**kwargs: Any,
171179
) -> None:
172180
"""Construct a new FsspecStore.
173181
@@ -197,15 +205,16 @@ def __init__( # noqa: PLR0913
197205
batch_size: some operations on many files will batch their requests; if you
198206
are seeing timeouts, you may want to set this number smaller than the
199207
defaults, which are determined in `fsspec.asyn._get_batch_size`.
208+
kwargs: per-store configuration passed down to store-specific builders.
200209
201210
**Examples:**
202211
203212
```py
204213
from obstore.fsspec import FsspecStore
205214
206-
store = FsspecStore(protocol="https")
207-
resp = store.cat("https://example.com")
208-
assert resp.startswith(b"<!doctype html>")
215+
store = FsspecStore("https")
216+
resp = store.cat_file("https://raw.githubusercontent.com/developmentseed/obstore/refs/heads/main/README.md")
217+
assert resp.startswith(b"# obstore")
209218
```
210219
211220
"""
@@ -223,6 +232,7 @@ def __init__( # noqa: PLR0913
223232
self.config = config
224233
self.client_options = client_options
225234
self.retry_config = retry_config
235+
self.config_kwargs = kwargs
226236

227237
# https://stackoverflow.com/a/68550238
228238
self._construct_store = lru_cache(maxsize=max_cache_size)(self._construct_store)
@@ -279,6 +289,7 @@ def _construct_store(self, bucket: str) -> ObjectStore:
279289
config=self.config,
280290
client_options=self.client_options,
281291
retry_config=self.retry_config,
292+
**self.config_kwargs,
282293
)
283294

284295
async def _rm_file(self, path: str, **_kwargs: Any) -> None:
@@ -782,11 +793,12 @@ def register(
782793
783794
Args:
784795
protocol: A single protocol (e.g., "s3", "gcs", "abfs") or
785-
a list of protocols to register FsspecStore for. Defaults to `None`, which
786-
will register `obstore` as the provider for all [supported
787-
protocols][obstore.fsspec.SUPPORTED_PROTOCOLS] **except** for `file://` and
788-
`memory://`. If you wish to use `obstore` via fsspec for `file://` or
789-
`memory://` URLs, list them explicitly.
796+
a list of protocols to register FsspecStore for.
797+
798+
Defaults to `None`, which will register `obstore` as the provider for all
799+
[supported protocols][obstore.fsspec.SUPPORTED_PROTOCOLS] **except** for
800+
`file://` and `memory://`. If you wish to use `obstore` via fsspec for
801+
`file://` or `memory://` URLs, list them explicitly.
790802
asynchronous: If `True`, the registered store will support
791803
asynchronous operations. Defaults to `False`.
792804

0 commit comments

Comments
 (0)