Skip to content
Merged
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Xpublish

Publish Xarray Datasets to the web
Publish Xarray Datasets and DataTrees to the web

<!-- badges-start -->

Expand All @@ -18,12 +18,14 @@ Publish Xarray Datasets to the web

## A quick example

**Serverside: Publish a Xarray Dataset through a rest API**
**Serverside: Publish an Xarray Dataset or DataTree through a REST API**

<!-- server-example-start -->

```python
ds.rest.serve(host="0.0.0.0", port=9000)
# or, for a hierarchical DataTree, the API is identical:
dt.rest.serve(host="0.0.0.0", port=9000)
```

<!-- server-example-end -->
Expand Down Expand Up @@ -55,9 +57,9 @@ Or to explore other access methods, open [http://0.0.0.0:9000/docs](http://0.0.0

## Why?

Xpublish lets you serve/share/publish Xarray Datasets via a web application.
Xpublish lets you serve/share/publish Xarray Datasets and DataTrees via a web application.

The data and/or metadata in the Xarray Datasets can be exposed in various forms through [pluggable REST API endpoints](https://xpublish.readthedocs.io/en/latest/user-guide/plugins.html).
The data and/or metadata can be exposed in various forms through [pluggable REST API endpoints](https://xpublish.readthedocs.io/en/latest/user-guide/plugins.html). Hierarchical data is supported natively — bare Datasets are wrapped in a single-node DataTree internally so the same routes, accessors, and plugins work whether you're serving a flat dataset or a deeply nested tree.
Efficient, on-demand delivery of large datasets may be enabled with Dask on the server-side.

Xpublish's [plugin ecosystem](https://xpublish.readthedocs.io/en/latest/ecosystem/index.html#plugins) has capabilities including:
Expand Down
22 changes: 21 additions & 1 deletion docs/source/api/included_plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,21 @@ Xpublish includes a set of built in plugins with associated endpoints.

## Dataset Info

The dataset info plugin provides a handful of default ways to display datasets and their metadata.
The dataset info plugin provides a handful of default ways to display datasets
and their metadata. Endpoints come in two flavors:

- **Root endpoints** — `/`, `/keys`, `/dict`, `/info` — operate on the root node
of the underlying {py:class}`xarray.DataTree`. For a flat dataset this is just
the dataset itself; for a DataTree it is the root group.
- **Group-aware endpoints** — `/groups/{group_path:path}/`,
`/groups/{group_path:path}/keys`, `/groups/{group_path:path}/dict`, and
`/groups/{group_path:path}/info` — return the same information for the node at
the given group path in the tree.

In addition, two tree-shaped endpoints expose the DataTree directly:

- `/tree` — HTML representation of the full DataTree.
- `/groups` — JSON list of every group path in the tree (e.g. `["/", "/a", "/a/b"]`).

```{eval-rst}
.. autosummary::
Expand All @@ -22,6 +36,12 @@ The dataset info plugin provides a handful of default ways to display datasets a
/datasets/{dataset_id}/keys
/datasets/{dataset_id}/dict
/datasets/{dataset_id}/info
/datasets/{dataset_id}/tree
/datasets/{dataset_id}/groups
/datasets/{dataset_id}/groups/{group_path}
/datasets/{dataset_id}/groups/{group_path}/keys
/datasets/{dataset_id}/groups/{group_path}/dict
/datasets/{dataset_id}/groups/{group_path}/info
```

## Module Version
Expand Down
55 changes: 54 additions & 1 deletion docs/source/api/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ plugins
## Top-level Rest class

The {class}`~xpublish.Rest` class can be used for publishing a
{class}`xarray.Dataset` object or a collection of Dataset objects.
{class}`xarray.Dataset` or {class}`xarray.DataTree` object, or a collection of either.
A bare Dataset is wrapped in a single-node DataTree internally so the rest of the
library operates uniformly on hierarchical data.

The main interfaces to Xpublish that many users may use.

Expand Down Expand Up @@ -44,6 +46,7 @@ by plugin dependencies.
Rest.setup_datasets
Rest.get_datasets_from_plugins
Rest.get_dataset_from_plugins
Rest.get_datatree_from_plugins
Rest.setup_plugins
Rest.init_cache_kwargs
Rest.init_app_kwargs
Expand Down Expand Up @@ -120,6 +123,50 @@ dataset. Proper use of this accessor should be like:
Dataset.rest.serve
```

## DataTree.rest (xarray accessor)

The same accessor is registered on {py:class}`xarray.DataTree`, exposing the
same interface for publishing a single hierarchical tree:

```
>>> import xarray as xr
>>> import xpublish
>>> dt = xr.DataTree() # or load one with xr.open_datatree(...)
>>> dt.rest(...) # configure (optional)
>>> dt.rest.serve() # serve the tree
```

**Calling the accessor**

```{eval-rst}
.. autosummary::
:toctree: generated/
:template: autosummary/accessor_callable.rst

DataTree.rest
```

**Properties**

```{eval-rst}
.. autosummary::
:toctree: generated/
:template: autosummary/accessor_attribute.rst

DataTree.rest.app
DataTree.rest.cache
```

**Methods**

```{eval-rst}
.. autosummary::
:toctree: generated/
:template: autosummary/accessor_method.rst

DataTree.rest.serve
```

## FastAPI dependencies

The functions below are defined in module `xpublish.dependencies` and can
Expand All @@ -139,7 +186,13 @@ passed in to the `Plugin.app_router` or `Plugin.dataset_router` method.

get_dataset_ids
get_dataset
get_datatree
get_cache
get_plugins
get_plugin_manager
```

When a route declares a `{group_path:path}` segment, `get_dataset` returns
the Dataset at that node of the underlying DataTree (or the root dataset if no
`group_path` is present). `get_datatree` returns the subtree rooted at the
requested group.
8 changes: 8 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,14 @@
# https://autodoc-pydantic.readthedocs.io/en/stable/users/configuration.html#show-schema-json-error-strategy
autodoc_pydantic_model_show_json_error_strategy = 'coerce'

# Skip rendering the JSON schema collapsible block for pydantic models.
# Several xpublish models (e.g. Dependencies) hold Callable fields that aren't
# JSON-serializable; on autodoc_pydantic 2.2.0 + pydantic 2.13 the second
# invocation of the "coerce" sanitized-model fallback for the same model
# returns a sibling whose core schema is still a MockCoreSchema and raises
# `PydanticUserError: <Model> is not fully defined`.
autodoc_pydantic_model_show_json = False

myst_enable_extensions = []
myst_heading_anchors = 6

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ This also allows organizations to quickly be able to adapt Xpublish to work in t

With this plugin, Xpublish can serve the same datasets as we explictly defined and loaded in [serving multiple datasets](./serving-multiple-datasets.md), as well as any others supported by [`xr.tutorial`](https://github.com/pydata/xarray/blob/main/xarray/tutorial.py)

The plugin implements {py:meth}`xpublish.plugins.hooks.PluginSpec.get_datatree` —
the modern provider hook. The older `get_dataset` hook is still honored for
backwards compatibility (with a {py:class}`DeprecationWarning`) but new plugins
should always implement `get_datatree`. See the [DataTrees tutorial](./datatrees.md)
for the lazy-by-group pattern used by Zarr/Icechunk-backed providers.

```{note}
For more details on building dataset provider plugins, please see the [plugin user guide](../../user-guide/plugins.md#dataset-provider-plugins)
```
10 changes: 8 additions & 2 deletions docs/source/getting-started/tutorial/dataset-provider-plugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,17 @@ def get_datasets(self):
return list(xr.tutorial.file_formats)

@hookimpl
def get_dataset(self, dataset_id: str):
def get_datatree(self, dataset_id: str, group: str):
# The xarray tutorial datasets are flat, so we only serve the root.
# Note: ``group`` must be a positional parameter (no default) — pluggy
# will not forward arguments that have defaults to the hookimpl.
if group:
return None
try:
return xr.tutorial.open_dataset(dataset_id)
ds = xr.tutorial.open_dataset(dataset_id)
except HTTPError:
return None
return xr.DataTree(dataset=ds)


rest = Rest({})
Expand Down
130 changes: 130 additions & 0 deletions docs/source/getting-started/tutorial/datatrees.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Serving DataTrees

Xpublish treats {py:class}`xarray.DataTree` as its core data primitive. A bare
{py:class}`xarray.Dataset` is just a one-node tree under the hood, so everything
you've learned so far about serving Datasets applies unchanged when you switch
to trees.

## Serving a single DataTree

You can publish a `DataTree` directly with {class}`~xpublish.SingleDatasetRest`
or the `.rest` accessor — the API is identical to the Dataset case:

```python
import xarray as xr
import xpublish

dt = xr.DataTree(name="root")
dt["a"] = xr.DataTree(dataset=xr.Dataset({"x": ("i", [1, 2, 3])}))
dt["a/b"] = xr.DataTree(dataset=xr.Dataset({"y": ("j", [10.0, 20.0])}))

rest = xpublish.SingleDatasetRest(dt)
# or, equivalently:
dt.rest.serve()
```

## Serving a collection of trees (and datasets)

{class}`~xpublish.Rest` accepts a mapping whose values can be either
`Dataset` or `DataTree` objects in any combination:

```python
rest = xpublish.Rest(
{
"flat": xr.Dataset({"var": ("x", [1, 2, 3])}),
"tree": dt,
}
)
rest.serve()
```

The flat dataset is wrapped in a single-node tree internally, so it shows up
in the `/groups` listing as just `["/"]`.

## Navigating groups via the URL

Per-dataset routes can include an optional `{group_path:path}` segment to
navigate into a node of the tree. The included `dataset_info` plugin uses this
convention to expose group-aware variants of its endpoints:

| URL | What it returns |
| -------------------------------- | ------------------------------------ |
| `/datasets/tree/` | HTML repr of the root node |
| `/datasets/tree/keys` | Variable keys at the root |
| `/datasets/tree/groups` | List of every group path in the tree |
| `/datasets/tree/tree` | HTML repr of the full DataTree |
| `/datasets/tree/groups/a/keys` | Variable keys at the `/a` node |
| `/datasets/tree/groups/a/b/info` | Schema info at the `/a/b` node |

Group paths can be arbitrarily nested — the `{group_path:path}` parameter
matches across slashes. An unknown group returns a `404`.

## Dataset provider plugins for trees

The provider hook for plugins is
{py:meth}`xpublish.plugins.hooks.PluginSpec.get_datatree`. It receives both the
`dataset_id` and the requested `group` path, and returns the
{py:class}`xarray.DataTree` rooted at that group (or `None` to pass to the next
plugin). The returned tree's root corresponds to the requested group.

```{important}
``group`` must be declared as a **positional** parameter (no default) on your
hookimpl. [Pluggy](https://pluggy.readthedocs.io/) does not forward arguments
that have defaults, so a signature like ``def get_datatree(self, dataset_id, group="")``
will silently receive an empty string regardless of the URL. See the
[plugin user guide](../../user-guide/plugins.md#dataset-provider-plugins) for
details.
```

### The lazy-by-group pattern

For backends where loading the whole tree is expensive (Zarr v3, Icechunk,
remote object stores), implement `get_datatree` so it opens *only* the
requested group and wraps it in a single-node tree:

```python
import xarray as xr
from xpublish import Plugin, hookimpl


class IcechunkProvider(Plugin):
name: str = "icechunk"

@hookimpl
def get_datasets(self):
return list(self._known_repos)

@hookimpl
def get_datatree(self, dataset_id: str, group: str):
store = self._store_for(dataset_id)
if store is None:
return None
ds = xr.open_zarr(store, group=group or None, consolidated=False)
return xr.DataTree(dataset=ds)
```

Each request opens just the one group being viewed, so cost stays proportional
to what's actually queried.

## Migrating from `get_dataset`

The older {py:meth}`xpublish.plugins.hooks.PluginSpec.get_dataset` hook is still
honored but emits a {py:class}`DeprecationWarning`. The Dataset it returns is
wrapped in a single-node DataTree, so only the root group is reachable through
it. Migrate to `get_datatree` to expose hierarchical data — the rename is
mechanical:

```python
# Before
@hookimpl
def get_dataset(self, dataset_id: str):
return xr.tutorial.open_dataset(dataset_id)


# After
@hookimpl
def get_datatree(self, dataset_id: str, group: str):
if group:
return None # we only serve a flat dataset
return xr.DataTree(dataset=xr.tutorial.open_dataset(dataset_id))
```
1 change: 1 addition & 0 deletions docs/source/getting-started/tutorial/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ hidden:
introduction
dataset-router
serving-multiple-datasets
datatrees
using-plugins
dataset-router-plugin
dataset-provider-plugin
Expand Down
4 changes: 4 additions & 0 deletions docs/source/getting-started/tutorial/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ for more convenience:
ds.rest
```

The same accessor is registered on {py:class}`xarray.DataTree` — `dt.rest`
works exactly like `ds.rest`. See the [DataTrees tutorial](./datatrees.md) for
how hierarchical data is served and navigated.

Optional customization of the underlying [FastAPI application](https://fastapi.tiangolo.com) or the server-side [cache](https://github.com/dask/cachey) is possible, e.g.,

```python
Expand Down
2 changes: 1 addition & 1 deletion docs/source/getting-started/why-xpublish.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Xarray provides an intuitive API on top of a foundational data model, labeled ar
This API and data model has formed the basis for a large and growing ecosystem of tools.

Xpublish stands on the shoulders of Xarray and the greater PyData ecosystem enabling both new and old users, interactions, and clients.
Xpublish does this by using Xarray datasets as the core data interchange format within the server, and surrounding that with an ecosystem of plugins.
Xpublish does this by using Xarray datasets and DataTrees as the core data interchange format within the server, and surrounding that with an ecosystem of plugins.

```{warning} Hold on to your hats, we're about to say Xpublish a lot
<div style='position:relative; padding-bottom:calc(75.00% + 44px)'><iframe src='https://gfycat.com/ifr/ShadowyHoarseInganue' frameborder='0' scrolling='no' width='100%' height='100%' style='position:absolute;top:0;left:0;' allowfullscreen></iframe></div><p> <a href="https://gfycat.com/shadowyhoarseinganue">via Gfycat</a></p>
Expand Down
Loading
Loading