Skip to content

[BUG] Categorify start_index not handled by Inference Op CategorifyTransform #1800

@oliverholworthy

Description

@oliverholworthy

Describe the bug
A clear and concise description of what the bug is.

When using Categorify with a non-default start_index value, the inference version of the operator returns a different result.

Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

import numpy as np
import cudf
import nvtabular as nvt


input_tensors = {
    "a": np.array(["x", "y", "z"])
}
df = cudf.DataFrame(input_tensors)
cat_names = df.columns
cats = cat_names >> nvt.ops.Categorify(start_index=1)
workflow = nvt.Workflow(cats)
workflow.fit(nvt.Dataset(df))

feature_transformed = workflow.transform(df)["a"]
# => [2, 3, 4]

model_config = {}
inference_op = cats.op.inference_initialize(cats.input_columns, model_config)
output_tensors = inference_op.transform(cats.input_columns, input_tensors)

feature_transformed_inference = output_tensors["a"]
# => [1, 2, 3]

Expected behavior
A clear and concise description of what you expected to happen.

Expect the inference version to return the same value as the regular operator transform

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions