-
-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Description
Describe the issue:
When assigning a new column to a dask object, it seems like the concrete subtype (e.g. geopandas.GeoDataFrame) is lost.
Minimal Complete Verifiable Example:
import dask.array
import dask.dataframe
import dask_geopandas
import geopandas
import pandas as pd
df = geopandas.GeoDataFrame({"geometry": geopandas.points_from_xy([0, 0], [0, 1])})
ddf = dask_geopandas.from_geopandas(df, npartitions=2)
ddf = ddf.clear_divisions() # this is important
b = dask.dataframe.from_dask_array(dask.array.zeros((2,), chunks=(1, 1)), index=ddf.index)
ddf.assign(a=b).geometry.x.compute() ## errorthat raises
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[/var/folders/x7/__bs9yvx21qbvzb17sj4qsh40000gn/T/ipykernel_95282/3433075730.py](http://127.0.0.1:8888/var/folders/x7/__bs9yvx21qbvzb17sj4qsh40000gn/T/ipykernel_95282/3433075730.py) in ?()
8 ddf = dask_geopandas.from_geopandas(df, npartitions=2)
9 ddf = ddf.clear_divisions() # this is important
10
11 b = dask.dataframe.from_dask_array(dask.array.zeros((2,), chunks=(1, 1)), index=ddf.index)
---> 12 ddf.assign(a=b).geometry.x.compute() ## error
[~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask_expr/_collection.py](http://127.0.0.1:8888/lab/tree/~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask_expr/_collection.py) in ?(self, fuse, concatenate, **kwargs)
476 out = self
477 if not isinstance(out, Scalar) and concatenate:
478 out = out.repartition(npartitions=1)
479 out = out.optimize(fuse=fuse)
--> 480 return DaskMethodsMixin.compute(out, **kwargs)
[~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask/base.py](http://127.0.0.1:8888/lab/tree/~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask/base.py) in ?(self, **kwargs)
368 See Also
369 --------
370 dask.compute
371 """
--> 372 (result,) = compute(self, traverse=False, **kwargs)
373 return result
[~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask/base.py](http://127.0.0.1:8888/lab/tree/~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/dask/base.py) in ?(traverse, optimize_graph, scheduler, get, *args, **kwargs)
656 keys.append(x.__dask_keys__())
657 postcomputes.append(x.__dask_postcompute__())
658
659 with shorten_traceback():
--> 660 results = schedule(dsk, keys, **kwargs)
661
662 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
[~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/pandas/core/generic.py](http://127.0.0.1:8888/lab/tree/~/gh/TomAugspurger/dask-geopandas-spatial-partitioning/.direnv/python-3.12/lib/python3.12/site-packages/pandas/core/generic.py) in ?(self, name)
6295 and name not in self._accessors
6296 and self._info_axis._can_hold_identifiers_and_holds_name(name)
6297 ):
6298 return self[name]
-> 6299 return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'x'geopandas.GeoSeries objects automatically add .x and .y to the geometry columns. We're getting a regular pandas.Series, causing the error.
Anything else we need to know?:
Having unknown divisions does seem to be necessary. Commenting out the def = ddf.clear_divisions() line makes the error go away. So I think we can maybe narrow the search to AssignAlign (and not Assign)
Environment:
-
Dask version: 2024.12.0
-
dask-expr from
main@ d7577a2 -
Python version:
-
Operating System:
-
Install method (conda, pip, source):
Metadata
Metadata
Assignees
Labels
No labels