Open
Description
The ONNX prediction function attempts to cast all input columns as np.float32
making it not compatible for string and categorical features.
Within ONNXPredictor.predict
, we can see the conversion (I've added the comment):
def predict(self, data, model, **kwargs):
super(ONNXPredictor, self).predict(data, model, **kwargs)
input_names = [i.name for i in model.get_inputs()]
session_result = model.run(None, {input_names[0]: data.to_numpy(np.float32)}) # CONVERSION TO FLOAT FAILS FOR STRINGs
if len(session_result) == 0:
raise DrumCommonException("ONNX model should return at least 1 output.")
if len(session_result) == 1:
preds = session_result[0]
else:
preds = self._handle_multiple_outputs(model, session_result)
return preds, None
Lots of Details
Consider the example Titanic Survivors which has mixed features and uses a ColumnTransformer
to apply various SkLearn transformation in a pipeline.
As noted in the example, ONNX can support a list of dictionaries as an input instead of a DataFrame:
inputs = {c: X_test2[c].values for c in X_test2.columns}
sess = rt.InferenceSession("pipeline_titanic.onnx")
pred_onx = sess.run(None, inputs)
DRUMs conversion on inbound DataFrame would fail in this case which feels like it would be very common.
Metadata
Metadata
Assignees
Labels
No labels