Skip to content

read_parquet_dask load_divisions with bounds #63

Open
@brl0

Description

@brl0

While using read_parquet_dask to read files written with pack_partitions_to_parquet method, passing bounds and load_divisions=True causes a KeyError. Reading the same file with one option or the other works.

Example:

from spatialpandas.io import read_parquet_dask
sdf = read_parquet_dask(path, bounds=bounds, load_divisions=True)

Error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/home/conda/store/816032bda5816104d47e08949e4ec085fd6b9a98be07c2a55cf29c652743653e-datum/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: -1

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-22-56d4a7063dd6> in <module>
----> 1 sdf_div = read_parquet_dask(pre_path, bounds=bounds, load_divisions=True)
      2 sdf_div

/home/conda/store/816032bda5816104d47e08949e4ec085fd6b9a98be07c2a55cf29c652743653e-datum/lib/python3.7/site-packages/spatialpandas/io/parquet.py in read_parquet_dask(path, columns, filesystem, load_divisions, geometry, bounds, categories)
    231         path, columns, filesystem,
    232         load_divisions=load_divisions, geometry=geometry, bounds=bounds,
--> 233         categories=categories
    234     )
    235 

/home/conda/store/816032bda5816104d47e08949e4ec085fd6b9a98be07c2a55cf29c652743653e-datum/lib/python3.7/site-packages/spatialpandas/io/parquet.py in _perform_read_parquet_dask(paths, columns, filesystem, load_divisions, geometry, bounds, categories)
    371 
    372     if load_divisions:
--> 373         divisions = div_mins + [div_maxes[-1]]
    374         if divisions != sorted(divisions):
    375             raise ValueError(

/home/conda/store/816032bda5816104d47e08949e4ec085fd6b9a98be07c2a55cf29c652743653e-datum/lib/python3.7/site-packages/pandas/core/series.py in __getitem__(self, key)
    822 
    823         elif key_is_scalar:
--> 824             return self._get_value(key)
    825 
    826         if is_hashable(key):

/home/conda/store/816032bda5816104d47e08949e4ec085fd6b9a98be07c2a55cf29c652743653e-datum/lib/python3.7/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
    930 
    931         # Similar to Index.get_value, but we do not fall back to positional
--> 932         loc = self.index.get_loc(label)
    933         return self.index._get_values_for_loc(self, loc, label)
    934 

/home/conda/store/816032bda5816104d47e08949e4ec085fd6b9a98be07c2a55cf29c652743653e-datum/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: -1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions