Fix udf op #342

jperez999 · 2023-06-08T16:21:13Z

This PR fixes issues with the UDF op outlined here NVIDIA-Merlin/systems#360. This issue occurs when the user defined function is created with a targeted framework. However if the function is setup with a specific framework, the output would end up as the framework specified. This causes issues with the expected output (which should be the same as the input). The fix added here uses a dictionary instead of an input type collection. After the collection is complete we create a dataframe to ensure continued support for downstream operators. This is necessary to be able to all the different types of inputs possible (i.e. TensorTable, Cudf Dataframe, Pandas Dataframe). If using tensortable, it will be converted before the next operator in the executor.

github-actions · 2023-06-08T16:26:38Z

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-342

oliverholworthy · 2023-06-08T17:48:17Z

merlin/dag/ops/udf.py

@@ -78,7 +79,8 @@ def transform(
            else:
                # shouldn't ever happen,
                raise RuntimeError(f"unhandled UDF param count {self._param_count}")
-        return new_df
+        # return input type data
+        return make_df(new_df)


This works ok for data frame based graphs, but we might need to revisit in a follow-up change to make this support tensortable too?

No, I think its fine, for tensortable it also works because the executor handles the change to something compatible for the next operator. https://github.com/NVIDIA-Merlin/core/blob/main/merlin/dag/executors.py#L105

what about if a user-defined function returns a TensorColumn type, is that something we'd like to support?

oliverholworthy · 2023-06-08T17:49:56Z

It's probably worth adding a test for this situation where we pass a pandas data frame but the user defined function returns a cudf series and/or the other way around

jperez999 · 2023-06-08T18:43:54Z

It's probably worth adding a test for this situation where we pass a pandas data frame but the user defined function returns a cudf series and/or the other way around

Test added.

jperez999 added 3 commits June 8, 2023 10:16

fix udf operator input and output mismatch

874d55e

fix formatting

eb40e83

clean up and use dictionary and make_df to handle outputs correctly

6891b8c

jperez999 added the bug Something isn't working label Jun 8, 2023

jperez999 added this to the Merlin 23.06 milestone Jun 8, 2023

jperez999 requested a review from oliverholworthy June 8, 2023 16:21

jperez999 self-assigned this Jun 8, 2023

jperez999 mentioned this pull request Jun 8, 2023

[BUG] getting error when serving 0_transformworkflowtriton in session-based example NVIDIA-Merlin/systems#360

Closed

oliverholworthy reviewed Jun 8, 2023

View reviewed changes

jperez999 requested review from oliverholworthy, rnyak and nv-alaiacano June 8, 2023 17:55

add test for convert in lambda

599363a

oliverholworthy approved these changes Jun 9, 2023

View reviewed changes

oliverholworthy merged commit cc89d2a into NVIDIA-Merlin:main Jun 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix udf op #342

Fix udf op #342

Uh oh!

jperez999 commented Jun 8, 2023

Uh oh!

github-actions bot commented Jun 8, 2023

Uh oh!

oliverholworthy Jun 8, 2023

Uh oh!

jperez999 Jun 8, 2023

Uh oh!

oliverholworthy Jun 9, 2023

Uh oh!

oliverholworthy commented Jun 8, 2023

Uh oh!

jperez999 commented Jun 8, 2023

Uh oh!

Uh oh!

Fix udf op #342

Fix udf op #342

Uh oh!

Conversation

jperez999 commented Jun 8, 2023

Uh oh!

github-actions bot commented Jun 8, 2023

Documentation preview

Uh oh!

oliverholworthy Jun 8, 2023

Choose a reason for hiding this comment

Uh oh!

jperez999 Jun 8, 2023

Choose a reason for hiding this comment

Uh oh!

oliverholworthy Jun 9, 2023

Choose a reason for hiding this comment

Uh oh!

oliverholworthy commented Jun 8, 2023

Uh oh!

jperez999 commented Jun 8, 2023

Uh oh!

Uh oh!