-
Notifications
You must be signed in to change notification settings - Fork 149
Closed
Labels
Milestone
Description
Describe the bug
A clear and concise description of what the bug is.
When using Categorify with a non-default start_index value, the inference version of the operator returns a different result.
Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.
import numpy as np
import cudf
import nvtabular as nvt
input_tensors = {
"a": np.array(["x", "y", "z"])
}
df = cudf.DataFrame(input_tensors)
cat_names = df.columns
cats = cat_names >> nvt.ops.Categorify(start_index=1)
workflow = nvt.Workflow(cats)
workflow.fit(nvt.Dataset(df))
feature_transformed = workflow.transform(df)["a"]
# => [2, 3, 4]
model_config = {}
inference_op = cats.op.inference_initialize(cats.input_columns, model_config)
output_tensors = inference_op.transform(cats.input_columns, input_tensors)
feature_transformed_inference = output_tensors["a"]
# => [1, 2, 3]Expected behavior
A clear and concise description of what you expected to happen.
Expect the inference version to return the same value as the regular operator transform