Skip to content

Graph.lag should support DataFrames #813

@martinfleis

Description

@martinfleis

We have a weird situation in Graph.lag now. It technically supports doing a lag of multiple columns at once, but only if those are passed as a numpy array. DataFrame input does not work.

import geopandas as gpd
from geodatasets import get_path

from libpysal.graph import Graph

df = gpd.read_file(get_path('geoda south'))
w = Graph.build_contiguity(df)

# passing aș an array
w.lag(df[['HR60', 'HR70', 'HR80', 'HR90']].values)

array([[ 4.60723336,  3.28482738,  3.21367741,  1.23493381],
       [ 2.65699616,  9.44736896, 14.19951668,  3.56709144],
       [ 5.48348091,  7.71768729,  7.21931735,  5.69651071],
       ...,
       [56.8560935 , 54.66925574, 50.48798788, 35.41970065],
       [31.04795542, 33.44684482, 35.63513443, 43.98784183],
       [37.2522942 , 26.51949069, 25.9177717 , 11.0978376 ]],
      shape=(1412, 4))

# passing as a DataFrame

w.lag(df[['HR60', 'HR70', 'HR80', 'HR90']])

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/2f/fhks6w_d0k556plcv3rfmshw0000gn/T/ipykernel_70018/4208695360.py in ?()
----> 1 w.lag(df[['HR60', 'HR70', 'HR80', 'HR90']])

~/dev/pysal/libpysal/libpysal/graph/base.py in ?(self, y, categorical, ties)
   2083         >>> contiguity_r = contiguity.transform("r")
   2084         >>> contiguity_r.lag(y)
   2085         array([4.2, 1.5, 3. , 2.6, 4.5, 0. , 3. , 0. , 0. ])
   2086         """
-> 2087         return _lag_spatial(self, y, categorical=categorical, ties=ties)

~/dev/pysal/libpysal/libpysal/graph/_spatial_lag.py in ?(graph, y, categorical, ties)
     83     if isinstance(y, list):
     84         y = np.array(y)
     85 
     86     if (
---> 87         isinstance(y.dtype, pd.CategoricalDtype)
     88         or pd.api.types.is_object_dtype(y.dtype)
     89         or pd.api.types.is_bool_dtype(y.dtype)
     90         or pd.api.types.is_string_dtype(y.dtype)

~/dev/pysal/.pixi/envs/default/lib/python3.11/site-packages/pandas/core/generic.py in ?(self, name)
   6317             and name not in self._accessors
   6318             and self._info_axis._can_hold_identifiers_and_holds_name(name)
   6319         ):
   6320             return self[name]
-> 6321         return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'dtype'

I think we can make those checks for categorical more robust and allow this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions