feat: handling unseen categories in glm h2o architectures #81

vishpillai123 · 2025-12-16T18:31:33Z

GLM architectures in H2O (aka "Generalized Linear Models") have a number of differences compared to our other architectures DRT/XRT, GBM, and XGBoost (all of which are tree based models). These architectures struggle with unseen categories in enum variables ("term_program_of_study"), for example.

The best way to manage this is to impute categorical variables prior to running SHAP, otherwise we run into hard errors in H2O. I built a helper for GLM case that imputes with mode. We also log all of this prior to running SHAP.

We really should move away from high cardinality categoricals, as that's what creates the most risk. Will work on this separately.

Vishakh Pillai added 3 commits December 16, 2025 13:14

feat: adding in glm case

5a8c77f

feat: imputing with mode of category

c5a8680

fix: type check and added unit test

c366452

vishpillai123 changed the title ~~feat: h2o glm unseen categories~~ feat: handling unseen categories in glm h2o architectures Dec 16, 2025

vishpillai123 requested review from Mesh-ach and nm3224 December 16, 2025 21:16

vishpillai123 marked this pull request as ready for review December 16, 2025 21:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: handling unseen categories in glm h2o architectures #81

feat: handling unseen categories in glm h2o architectures #81

Uh oh!

vishpillai123 commented Dec 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: handling unseen categories in glm h2o architectures #81

Are you sure you want to change the base?

feat: handling unseen categories in glm h2o architectures #81

Uh oh!

Conversation

vishpillai123 commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vishpillai123 commented Dec 16, 2025 •

edited

Loading