Skip to content

Add support for DataArray objects read through rioxarray #445

Open
@darribas

Description

@darribas

The raster functionality in for weights building currently supports DataArray objects created through xarray.open_rasterio:

import xarray
from libpysal.weights import Queen
pop = xarray.open_rasterio(
               'https://geographicdata.science/book/_downloads/5263090bd0bdbd7d1635505ff7d36d04/ghsl_sao_paulo.tif'
)
w = Queen.from_xarray(pop)

The code above correctly picks up that missing data are specified with -200 and removes them from the calculation of the weights:

w.n
> 97232

This is not the case when we use DataArray objects build through the (generally recommended) rioxarray.open_rasterio:

import rioxarray
pop2  = rioxarray.open_rasterio(
               'https://geographicdata.science/book/_downloads/5263090bd0bdbd7d1635505ff7d36d04/ghsl_sao_paulo.tif'
)
w2 = Queen.from_xarray(pop2)

which does not pick up the missing data:

w.n
> 194688

My hunch is that this is because the builder picks up missing values only through the attrs object:

if "nodatavals" in da.attrs and da.attrs["nodatavals"]:
mask = (ser != da.attrs["nodatavals"][0]).to_numpy()
ids = np.where(mask)[0]
id_map = _idmap(ids, mask, dtype)
ser = ser[ser != da.attrs["nodatavals"][0]]

But rioxarray does not populate that general object, instead logs missing data under pop2.rio.nodata:

pop.attrs
> {'transform': (250.0, 0.0, -4482000.0, 0.0, -250.0, -2822000.0),
 'crs': '+proj=moll +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs=True',
 'res': (250.0, 250.0),
 'is_tiled': 0,
 'nodatavals': (-200.0,),
 'scales': (1.0,),
 'offsets': (0.0,),
 'AREA_OR_POINT': 'Area',
 'grid_mapping': 'spatial_ref'}

pop2.attrs
> {'_FillValue': -200.0, 'scale_factor': 1.0, 'add_offset': 0.0}

pop2.rio.nodata
> -200.0

I think we should support both formats, read through vanilla xarray and with rioxarray loaded in the session too. The former is a more generic approach, the latter is a more robust one for rasters.

Not sure what'd be the ideal way of supporting this. I'm cc'ing here @MgeeeeK since he worked on the original implementation and might have some ideas, and @snowman2 as the main driver behind rioxarray in case they have some suggestions in terms of best practices.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions