Skip to content

ENH: read a list of GIS files into chunks #79

Open
@martinfleis

Description

@martinfleis

I have a list of GeoPackages, one for an urban area, I need to read to dask.GeoDataFrame. Since they are already essentially spatially partitioned, the optimal way would be to read each as a chunk directly. Now I have to read them one by one via GeoPandas, concatenate and then create dask.GeoDataFrame from geopandas.GeoDataFrame, which loses spatial partitions.

For cases like this, it may be useful to have dask_geopandas.read_files(list) function which would call geopandas.read_file for each chunk and create chunked GeoDataFrame directly. It would be helpful to be able to pass both list and a path to a folder (like we do with parquet) since in the list you can specify a path in the zip for example (my case).

This is the existing code I am using:

paths = ["foo/bar/one.zip!data/file.gpkg", "foo/bar/two.zip!data/file.gpkg"]
gdfs = []

for file in paths:
    gdf = gpd.read_file(file)
    gdfs.append(gdf)

gdf = pd.concat(gdfs)

ddf = dask_geopandas.from_geopandas(gdf, npartitions=2)  # non spatial chunks

And this would be optimal:

paths = ["foo/bar/one.zip!data/file.gpkg", "foo/bar/two.zip!data/file.gpkg"]

ddf = dask_geopandas.read_files(paths)  # one chunk per file

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions