Skip to content

Add zero-sum encoding option for categorical predictors#973

Open
Chirag3841 wants to merge 1 commit into
bambinos:mainfrom
Chirag3841:cat
Open

Add zero-sum encoding option for categorical predictors#973
Chirag3841 wants to merge 1 commit into
bambinos:mainfrom
Chirag3841:cat

Conversation

@Chirag3841
Copy link
Copy Markdown

Thank your for opening a PR!

Before you proceed, please check the following notes.

Issue #941
Description

This PR adds support for an alternative categorical encoding strategy using a zero-sum constraint via pm.ZeroSumNormal.
Users can now choose between "reference" (default) and "zero-sum" encoding through the new categorical_encoding argument in bmb.Model.

Changes done
Added categorical_encoding argument to bmb.Model
Implemented "zero-sum" encoding for categorical predictors in PyMC backend using pm.ZeroSumNormal
Added unit test to verify model builds correctly with categorical_encoding="zero-sum"

@tomicapretto
Copy link
Copy Markdown
Collaborator

@Chirag3841, thanks for the initiative to start this contribution. I'm afraid supporting zero-sum encoding in Bambi is a way more complex endeavor than what this PR attempts to do. For instance, we'd need to modify the model-building library, formulae, to return non-full rank matrices in cases we want to use ZeroSum normal. And then we'd need to modify multiple parts of Bambi to account for that change. The addition of a model-level argument is not a suitable solution, as the encoding of each categorical covariate is controlled at the covariate level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants