Skip to content

Feature Enhancement Proposal: Wilkinson Formulas and New Model Interfaces #101

@tdhoffman

Description

@tdhoffman

Pursuant to the discussion in #95, I've implemented a version of Wilkinson formulas for spatial lag and spatial error models. The code is available on the formula branch of my fork of spreg. spreg.from_formula() parses a Wilkinson formula and returns a fitted OLS, spatial lag, or spatial error model depending on the user's inputs. The function signature is:

spreg.from_formula(formula, df, w=None, method="gm", debug=False, **kwargs)
where:

  • formula is a string following formulaic's syntax and the below additional syntax
  • df is a DataFrame or GeoDataFrame
  • w is a libpysal.weights.W object
  • method is a string describing the estimation method (for spatial lag or error models)
  • debug is a boolean which (when true) outputs the parsed formula alongside the configured model
  • **kwargs are additional arguments to be passed to the model class

The new formula syntax comes in two parts:

  • The <...> operator:
    • Enclose variables (covariates or response) in angle brackets to denote a spatial lag.
    • < and > are not reserved characters in formulaic, so there are no conflicts.
    • Usage: <FIELD> adds spatial lag of that field from the dataframe to model matrix.
    • Can use other transformations within <...>, e.g. <{10*FIELD1} + FIELD2> will be correctly parsed.
  • The & symbol:
    • Adds a spatial error component to a model.
    • & is not a reserved character in formulaic, so there are no conflicts.
  • The parser accepts combinations of <...> and &: <FIELD1 + ... + FIELDN> + & is the most general possible spatial model available. However, the dispatcher does not currently dispatch to the combo model classes (future TODO).

Importantly, all terms and operators MUST be space delimited in order for the parser to properly pick up on the tokens. The current design also requires the user to have constructed a weights matrix first, which I think makes sense as the weights functionality is well-documented and external to the actual running of the model.

While implementing this, I ran into stumbling blocks in other parts of the spreg API that have led me to work on a redesign of the basic modeling classes and their dependencies. These changes can be found in the api-dev branch of my fork of spreg, where I've been streamlining the user-oriented API to OLS (see prop_ols.py), spatial lag (prop_lag.py), and spatial error (prop_err.py) models. These new interfaces are works in progress and will be described further in a future Feature Enhancement Proposal. However, the new interfaces I've created are not backwards-compatible with current spreg code. Looking ahead, would it make sense to focus on designing a new package with updated spatial regression interfaces, or to create parallel interfaces in spreg to the same estimation code?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions