Skip to content

Predictions with ONNX don't support non-numeric inputs #688

Open
@kindofluke

Description

@kindofluke

The ONNX prediction function attempts to cast all input columns as np.float32 making it not compatible for string and categorical features.

Within ONNXPredictor.predict, we can see the conversion (I've added the comment):

    def predict(self, data, model, **kwargs):
        super(ONNXPredictor, self).predict(data, model, **kwargs)

        input_names = [i.name for i in model.get_inputs()]
        session_result = model.run(None, {input_names[0]: data.to_numpy(np.float32)}) # CONVERSION TO FLOAT FAILS FOR STRINGs

        if len(session_result) == 0:
            raise DrumCommonException("ONNX model should return at least 1 output.")

        if len(session_result) == 1:
            preds = session_result[0]
        else:
            preds = self._handle_multiple_outputs(model, session_result)
        return preds, None

Lots of Details

Consider the example Titanic Survivors which has mixed features and uses a ColumnTransformer to apply various SkLearn transformation in a pipeline.

As noted in the example, ONNX can support a list of dictionaries as an input instead of a DataFrame:

inputs = {c: X_test2[c].values for c in X_test2.columns}
sess = rt.InferenceSession("pipeline_titanic.onnx")
pred_onx = sess.run(None, inputs)

DRUMs conversion on inbound DataFrame would fail in this case which feels like it would be very common.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions