Description
Hello, I am running into an issue where using datashader on a DaskGeoDataFrame results in an error. To reproduce, I have the following poetry environment running on Ubuntu 22.04.5 LTS:
python = ">=3.12,<3.13"
spatialpandas = "0.5.0"
dask = "2025.3.0"
datashader = "0.17.0"
numpy = "2.1.3"
I followed this blog post from Holoviz to set up the DaskGeoDataFrame, and the code that generates the error is the below:
from pathlib import Path
from datashader import Canvas
from spatialpandas.dask import DaskGeoDataFrame
from spatialpandas.io import read_parquet_dask
def run():
pq_file = Path(__file__).parent / "data" / "test.parq"
gdf = read_parquet_dask(pq_file)
assert isinstance(gdf, DaskGeoDataFrame)
canvas = Canvas()
canvas.points(gdf, geometry="geometry")
if __name__ == "__main__":
run()
This gives the following error:
Traceback (most recent call last):
File "2025-03-27_minimal.py", line 54, in <module>
run()
File "2025-03-27_minimal.py", line 50, in run
canvas.points(gdf, geometry="geometry")
File "/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/datashader/core.py", line 229, in points
return bypixel(source, self, glyph, agg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/datashader/core.py", line 1351, in bypixel
return bypixel.pipeline(source, schema, canvas, glyph, agg, antialias=antialias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/datashader/utils.py", line 121, in __call__
return lk[cls](head, *rest, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/datashader/data_libraries/dask.py", line 42, in dask_pipeline
return da.compute(dsk, scheduler=scheduler)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/dask/base.py", line 656, in compute
results = schedule(dsk, keys, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/dask/local.py", line 455, in get_async
raise ValueError("Found no accessible jobs in dask")
ValueError: Found no accessible jobs in dask
Process finished with exit code 1
To get the code to work, I had to revert the packages to the following:
python = ">=3.12,<3.13"
spatialpandas = "0.4.10"
dask = "2024.12.1"
datashader = "0.17.0"
numpy = "1.26.4"
The only output now is a bunch of warnings:
/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/dask/dataframe/__init__.py:49: FutureWarning:
Dask dataframe query planning is disabled because dask-expr is not installed.
You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.
warnings.warn(msg, FutureWarning)
/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/spatialpandas/io/parquet.py:353: FutureWarning: Passing 'use_legacy_dataset' is deprecated as of pyarrow 15.0.0 and will be removed in a future version.
d = ParquetDataset(
/home/titanium/.cache/pypoetry/virtualenvs/sandbox-datashader2-_RrFaDUd-py3.12/lib/python3.12/site-packages/spatialpandas/io/parquet.py:137: FutureWarning: Passing 'use_legacy_dataset' is deprecated as of pyarrow 15.0.0 and will be removed in a future version.
dataset = ParquetDataset(
# the same warning is repeated many times
Process finished with exit code 0
I wasn't sure how to create an empty DaskGeoDataFrame, but the way I generated the parquet file was to download one of the csv files as mentioned in the above Holoviz blog post and use the below script:
from pathlib import Path
import dask.dataframe as dd
import numpy as np
from dask.diagnostics import ProgressBar
from spatialpandas import GeoDataFrame
from spatialpandas.geometry import PointArray
def lon_lat_to_easting_northing(longitude, latitude):
# copied here to avoid dependency on holoviews
origin_shift = np.pi * 6378137
easting = longitude * origin_shift / 180.0
with np.errstate(divide="ignore", invalid="ignore"):
northing = (
np.log(np.tan((90 + latitude) * np.pi / 360.0)) * origin_shift / np.pi
)
return easting, northing
def convert_partition(df):
east, north = lon_lat_to_easting_northing(
df["LON"].astype("float32"), df["LAT"].astype("float32")
)
return GeoDataFrame({"geometry": PointArray((east, north))})
def convert_csv_to_gdf():
base_dir = Path(__file__).parent / "data"
csv_files = base_dir / "AIS_2020_01*.csv"
pq_file = base_dir / "test.parq"
example = GeoDataFrame({"geometry": PointArray([], dtype="float32")})
with ProgressBar():
print("Reading csv files")
gdf = dd.read_csv(csv_files, assume_missing=True)
gdf = gdf.map_partitions(convert_partition, meta=example)
print("Writing parquet file")
gdf = gdf.pack_partitions_to_parquet(pq_file, npartitions=64)
return gdf
if __name__ == "__main__":
convert_csv_to_gdf()
using the below versions:
python = ">=3.12,<3.13"
spatialpandas = "0.4.10"
dask = "2024.12.1"
datashader = "0.17.0"
numpy = "1.26.4"
This is not exactly breaking, but it would be nice to be able to use updated packages. Thank you!