Add transformer defined by R-style formula 

I think it would be useful (especially to R users) to have an MLJ formula-based transformer that can be inserted anywhere in an MLJ pipeline (or other composite model). Here "formula" means "one-side formula"; I don't think two-sided formulas make much sense in the MLJ context because the target and features are treated separately, like in sklearn. 

StatsModels.@formula apparatus appears to provide most of what is needed here already - check out the [docs](https://juliastats.org/StatsModels.jl/stable/formula/). So this is hopefully just wrapping that. 

This transformer would probably be a [`Static` model](https://alan-turing-institute.github.io/MLJ.jl/dev/transformers/#Static-transformers) with a one-sided StatsModels formula as parameter. Ideally, and for consistency, it would perform a table-to-*table* transformation, rather than a table-to-matrix transformation, which is what StatsModels does. This does cause problems for very-high cardinality categorical features (which get one-hot encoded when you apply StatsBase formula??)  but does have the advantage that new columns would come with informative names for interpretation downstream of the transformer. Actually, it probably makes sense *not* to force one-hot encoding anyway, as not all supervised models need this and we already have transformers to do one-hot encoding which generate the new column names.

I recall slack discussions with @kleinschmidt about this (now lost to the ether). Perhaps he would care to chime in. 

See also https://github.com/JuliaAI/MLJModels.jl/issues/314 and https://github.com/JuliaAI/MLJGLMInterface.jl/issues/13 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add transformer defined by R-style formula #406

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add transformer defined by R-style formula #406

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions