Skip to content

DaskGeoDataFrame and datashader incompatibility #178

Open
@ahnsws

Description

@ahnsws

Hello, I am running into an issue where using datashader on a DaskGeoDataFrame results in an error. To reproduce, I have the following poetry environment running on Ubuntu 22.04.5 LTS:

python = ">=3.12,<3.13"
spatialpandas = "0.5.0"
dask = "2025.3.0"
datashader = "0.17.0"
numpy = "2.1.3"

I followed this blog post from Holoviz to set up the DaskGeoDataFrame, and the code that generates the error is the below:

from pathlib import Path

from datashader import Canvas
from spatialpandas.dask import DaskGeoDataFrame
from spatialpandas.io import read_parquet_dask


def run():
    pq_file = Path(__file__).parent / "data" / "test.parq"

    gdf = read_parquet_dask(pq_file)
    assert isinstance(gdf, DaskGeoDataFrame)

    canvas = Canvas()
    canvas.points(gdf, geometry="geometry")


if __name__ == "__main__":
    run()

This gives the following error:

Traceback (most recent call last):
  File "2025-03-27_minimal.py", line 54, in <module>
    run()
  File "2025-03-27_minimal.py", line 50, in run
    canvas.points(gdf, geometry="geometry")
  File "/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/datashader/core.py", line 229, in points
    return bypixel(source, self, glyph, agg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/datashader/core.py", line 1351, in bypixel
    return bypixel.pipeline(source, schema, canvas, glyph, agg, antialias=antialias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/datashader/utils.py", line 121, in __call__
    return lk[cls](head, *rest, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/datashader/data_libraries/dask.py", line 42, in dask_pipeline
    return da.compute(dsk, scheduler=scheduler)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/dask/base.py", line 656, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/dask/local.py", line 455, in get_async
    raise ValueError("Found no accessible jobs in dask")
ValueError: Found no accessible jobs in dask

Process finished with exit code 1

To get the code to work, I had to revert the packages to the following:

python = ">=3.12,<3.13"
spatialpandas = "0.4.10"
dask = "2024.12.1"
datashader = "0.17.0"
numpy = "1.26.4"

The only output now is a bunch of warnings:

/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/dask/dataframe/__init__.py:49: FutureWarning: 
Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.

  warnings.warn(msg, FutureWarning)
/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/spatialpandas/io/parquet.py:353: FutureWarning: Passing 'use_legacy_dataset' is deprecated as of pyarrow 15.0.0 and will be removed in a future version.
  d = ParquetDataset(
/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/spatialpandas/io/parquet.py:137: FutureWarning: Passing 'use_legacy_dataset' is deprecated as of pyarrow 15.0.0 and will be removed in a future version.
  dataset = ParquetDataset(
# the same warning is repeated many times

Process finished with exit code 0

I wasn't sure how to create an empty DaskGeoDataFrame, but the way I generated the parquet file was to download one of the csv files as mentioned in the above Holoviz blog post and use the below script:

from pathlib import Path

import dask.dataframe as dd
import numpy as np
from dask.diagnostics import ProgressBar
from spatialpandas import GeoDataFrame
from spatialpandas.geometry import PointArray


def lon_lat_to_easting_northing(longitude, latitude):
    # copied here to avoid dependency on holoviews
    origin_shift = np.pi * 6378137
    easting = longitude * origin_shift / 180.0
    with np.errstate(divide="ignore", invalid="ignore"):
        northing = (
            np.log(np.tan((90 + latitude) * np.pi / 360.0)) * origin_shift / np.pi
        )
    return easting, northing


def convert_partition(df):
    east, north = lon_lat_to_easting_northing(
        df["LON"].astype("float32"), df["LAT"].astype("float32")
    )
    return GeoDataFrame({"geometry": PointArray((east, north))})


def convert_csv_to_gdf():
    base_dir = Path(__file__).parent / "data"
    csv_files = base_dir / "AIS_2020_01*.csv"

    pq_file = base_dir / "test.parq"
    example = GeoDataFrame({"geometry": PointArray([], dtype="float32")})

    with ProgressBar():
        print("Reading csv files")
        gdf = dd.read_csv(csv_files, assume_missing=True)
        gdf = gdf.map_partitions(convert_partition, meta=example)

        print("Writing parquet file")
        gdf = gdf.pack_partitions_to_parquet(pq_file, npartitions=64)

    return gdf


if __name__ == "__main__":
    convert_csv_to_gdf()

using the below versions:

python = ">=3.12,<3.13"
spatialpandas = "0.4.10"
dask = "2024.12.1"
datashader = "0.17.0"
numpy = "1.26.4"

This is not exactly breaking, but it would be nice to be able to use updated packages. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions