Feature Enhancement Proposal: Wilkinson Formulas and New Model Interfaces

Pursuant to the discussion in #95, I've implemented a version of Wilkinson formulas for spatial lag and spatial error models. The code is available on the `formula` branch of [my fork of `spreg`](https://github.com/tdhoffman/spreg/blob/formula/spreg/formula.py). `spreg.from_formula()` parses a Wilkinson formula and returns a fitted OLS, spatial lag, or spatial error model depending on the user's inputs. The function signature is:

`spreg.from_formula(formula, df, w=None, method="gm", debug=False, **kwargs)`
where:
- `formula` is a string following [`formulaic`](https://matthewwardrop.github.io/formulaic/)'s syntax and the below additional syntax
- `df` is a `DataFrame` or `GeoDataFrame`
- `w` is a `libpysal.weights.W` object
- `method` is a string describing the estimation method (for spatial lag or error models)
- `debug` is a boolean which (when true) outputs the parsed formula alongside the configured model
- `**kwargs` are additional arguments to be passed to the model class

The new formula syntax comes in two parts:
- The `<...>` operator:
    - Enclose variables (covariates or response) in angle brackets to denote a spatial lag.
    - `<` and `>` are not reserved characters in `formulaic`, so there are no conflicts.
    - Usage: `<FIELD>` adds spatial lag of that field from the dataframe to model matrix. 
    - Can use other transformations within `<...>`, e.g. `<{10*FIELD1} + FIELD2>` will be correctly parsed.
- The `&` symbol:
    - Adds a spatial error component to a model.
    - `&` is not a reserved character in `formulaic`, so there are no conflicts.
- The parser accepts combinations of `<...>` and `&`: `<FIELD1 + ... + FIELDN> + &` is the most general possible spatial model available. However, the dispatcher does not currently dispatch to the combo model classes (future TODO).

Importantly, all terms and operators MUST be space delimited in order for the parser to properly pick up on the tokens. The current design also requires the user to have constructed a weights matrix first, which I think makes sense as the weights functionality is well-documented and external to the actual running of the model.

While implementing this, I ran into stumbling blocks in other parts of the `spreg` API that have led me to work on a redesign of the basic modeling classes and their dependencies. These changes can be found in the `api-dev` branch of my fork of `spreg`, where I've been streamlining the user-oriented API to OLS (see `prop_ols.py`), spatial lag (`prop_lag.py`), and spatial error (`prop_err.py`) models. These new interfaces are works in progress and will be described further in a future Feature Enhancement Proposal. However, the new interfaces I've created are not backwards-compatible with current `spreg` code. Looking ahead, would it make sense to focus on designing a new package with updated spatial regression interfaces, or to create parallel interfaces in `spreg` to the same estimation code?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Enhancement Proposal: Wilkinson Formulas and New Model Interfaces #101

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Enhancement Proposal: Wilkinson Formulas and New Model Interfaces #101

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions