ENH: read a list of GIS files into chunks

I have a list of GeoPackages, one for an urban area, I need to read to dask.GeoDataFrame. Since they are already essentially spatially partitioned, the optimal way would be to read each as a chunk directly. Now I have to read them one by one via GeoPandas, concatenate and then create dask.GeoDataFrame from geopandas.GeoDataFrame, which loses spatial partitions.

For cases like this, it may be useful to have `dask_geopandas.read_files(list)` function which would call `geopandas.read_file` for each chunk and create chunked GeoDataFrame directly. It would be helpful to be able to pass both `list`  and a path to a folder (like we do with parquet) since in the list you can specify a path in the zip for example (my case).

This is the existing code I am using:

```py
paths = ["foo/bar/one.zip!data/file.gpkg", "foo/bar/two.zip!data/file.gpkg"]
gdfs = []

for file in paths:
    gdf = gpd.read_file(file)
    gdfs.append(gdf)

gdf = pd.concat(gdfs)

ddf = dask_geopandas.from_geopandas(gdf, npartitions=2)  # non spatial chunks
```

And this would be optimal:

```py
paths = ["foo/bar/one.zip!data/file.gpkg", "foo/bar/two.zip!data/file.gpkg"]

ddf = dask_geopandas.read_files(paths)  # one chunk per file
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: read a list of GIS files into chunks #79

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: read a list of GIS files into chunks #79

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions