-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Pursuant to the discussion in #95, I've implemented a version of Wilkinson formulas for spatial lag and spatial error models. The code is available on the formula branch of my fork of spreg. spreg.from_formula() parses a Wilkinson formula and returns a fitted OLS, spatial lag, or spatial error model depending on the user's inputs. The function signature is:
spreg.from_formula(formula, df, w=None, method="gm", debug=False, **kwargs)
where:
formulais a string followingformulaic's syntax and the below additional syntaxdfis aDataFrameorGeoDataFramewis alibpysal.weights.Wobjectmethodis a string describing the estimation method (for spatial lag or error models)debugis a boolean which (when true) outputs the parsed formula alongside the configured model**kwargsare additional arguments to be passed to the model class
The new formula syntax comes in two parts:
- The
<...>operator:- Enclose variables (covariates or response) in angle brackets to denote a spatial lag.
<and>are not reserved characters informulaic, so there are no conflicts.- Usage:
<FIELD>adds spatial lag of that field from the dataframe to model matrix. - Can use other transformations within
<...>, e.g.<{10*FIELD1} + FIELD2>will be correctly parsed.
- The
&symbol:- Adds a spatial error component to a model.
&is not a reserved character informulaic, so there are no conflicts.
- The parser accepts combinations of
<...>and&:<FIELD1 + ... + FIELDN> + &is the most general possible spatial model available. However, the dispatcher does not currently dispatch to the combo model classes (future TODO).
Importantly, all terms and operators MUST be space delimited in order for the parser to properly pick up on the tokens. The current design also requires the user to have constructed a weights matrix first, which I think makes sense as the weights functionality is well-documented and external to the actual running of the model.
While implementing this, I ran into stumbling blocks in other parts of the spreg API that have led me to work on a redesign of the basic modeling classes and their dependencies. These changes can be found in the api-dev branch of my fork of spreg, where I've been streamlining the user-oriented API to OLS (see prop_ols.py), spatial lag (prop_lag.py), and spatial error (prop_err.py) models. These new interfaces are works in progress and will be described further in a future Feature Enhancement Proposal. However, the new interfaces I've created are not backwards-compatible with current spreg code. Looking ahead, would it make sense to focus on designing a new package with updated spatial regression interfaces, or to create parallel interfaces in spreg to the same estimation code?