Skip to content

Dask Summit 2021 - "Scaling geospatial vector data" workshop #4

Open
@jorisvandenbossche

Description

@jorisvandenbossche

During the Dask Summit, we have a 2-hour workshop scheduled about scaling geospatial vector data on Thursday May 20th at 11-13:00 UTC (https://summit.dask.org/schedule/presentation/22/scaling-geospatial-vector-data/)

We can use this issue to further gather ideas and discuss the exact content of the workshop.

Workshop abstract:

The geospatial Python ecosystem provides a nice set of tools for working with vector data, including Shapely for geometry operations and GeoPandas to work with tabular data (and many other packages for IO, visualization, domain specific processing, …). One of the limitations of those core tools is a sub-optimal performance and limited scaling possibilities.

Over the last years, effort has been put in improving the performance through vectorized interfaces to GEOS, the underlying C library of Shapely. In turn, that enables releasing the GIL and makes the Dask - GeoPandas combination more interesting. GeoPandas is an extension to the pandas DataFrame, and thus how Dask scales pandas can be applied on GeoPandas as well. Initial effort to build a bridge between Dask and GeoPandas is currently taking the shape of the dask-geopandas library.

Also other interesting efforts in this space are popping up. The SpatialPandas package provides alternative pandas and Dask extensions for vectorized spatial and geometric operations. Libraries such as datashader and pydeckgl can be used to visualize larger spatial datasets.

This workshop will give a brief overview of some of the packages and ongoing efforts, and provide a place to discuss further improvements and interoperability between the libraries, with an emphasis on the conceptual design of distributed computation on inherently unpredictable vector data.

More detailed agenda:

  • Demo of dask-geopandas - Joris Van den Bossche
  • spatialpandas - Jon Mease
  • Datashader for visualizing geospatial data - Jim Bednar
  • Use cases:
    • Stefanie Lumnitz - GEDI data for biomass estimation
    • Anita Graser - movement data
    • Dani Arribas-Bel - areal interpolation
  • Partitioning of spatial data - Martin Fleischmann + dicussion
  • IO - brief overview of current possibilities + open discussion about what is needed

cc @martinfleis @jsignell

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions