Skip to content

Implement support for missing values with XGBoost #481

Open
@styrmis

Description

@styrmis

Inference for XGBoost models is implemented using the NaiveAdditiveDecisionTree implementation. As it is a DenseLtrRanker, it fills in missing values with 0. If the data that the XGBoost model was trained on contained missing values, then the scores produced in training may not match those in production, unless we similarly fill in missing values with 0.

This is related to #135, #353, and has been partly implemented in #452 (which was ultimately merged via #480). With this change we now visit the designated missing node when a score is missing, but we won't hit this branch as missing scores are filled in with 0 at inference time.

This issue proposes that we alter the implementation of NaiveAdditiveDecisionTree to not fill in missing values, given that the implementation now correctly follows the model specification when missing values are encountered.

In the meantime we have found that we can achieve parity in scoring between training and inference by filling in missing values with 0 in the training data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions