Open
Description
When I load a csv first into dask, and then into dask dataframe using .from_dask_dataframe, ._meta_nonempty does not exist, causing downstream problems in analysis (e.g. with spatial_shuffle
). My hackish solution below takes the head, uses from_geopandas
to get the meta, and the replaces the meta in the original. It would be nice to make this just work directly! Not sure if it replicates for other people.
# Load a csv file
df = dd.read_csv(fname,
dtype = {'longitude':float,
'latitude':float,
'geometry':'object',
}).repartition(npartitions=njobs) # njobs is the number of workers I have
# Translate to geometry using shapely
df['geometry'] = df.geometry.map(shapely.wkt.loads,meta=('geometry','object'))
# Create a tmp dataframe using a Geodataframe and from_geopandas
tmp = dg.from_geopandas(gpd.GeoDataFrame(df.head(),geometry = 'geometry',crs = 'EPSG:4326'),npartitions = 1)
# Now create the dask_geopandas df
df = dg.from_dask_dataframe(df)
# Need to set metadata here, otherwise spatial_shuffle won't run.
df._meta = tmp.compute()
df = df.spatial_shuffle()
Metadata
Metadata
Assignees
Labels
No labels