Skip to content

Fix udf op #342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 9, 2023
Merged

Fix udf op #342

merged 4 commits into from
Jun 9, 2023

Conversation

jperez999
Copy link
Collaborator

This PR fixes issues with the UDF op outlined here NVIDIA-Merlin/systems#360. This issue occurs when the user defined function is created with a targeted framework. However if the function is setup with a specific framework, the output would end up as the framework specified. This causes issues with the expected output (which should be the same as the input). The fix added here uses a dictionary instead of an input type collection. After the collection is complete we create a dataframe to ensure continued support for downstream operators. This is necessary to be able to all the different types of inputs possible (i.e. TensorTable, Cudf Dataframe, Pandas Dataframe). If using tensortable, it will be converted before the next operator in the executor.

@jperez999 jperez999 added the bug Something isn't working label Jun 8, 2023
@jperez999 jperez999 added this to the Merlin 23.06 milestone Jun 8, 2023
@jperez999 jperez999 requested a review from oliverholworthy June 8, 2023 16:21
@jperez999 jperez999 self-assigned this Jun 8, 2023
@github-actions
Copy link

github-actions bot commented Jun 8, 2023

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-342

@@ -78,7 +79,8 @@ def transform(
else:
# shouldn't ever happen,
raise RuntimeError(f"unhandled UDF param count {self._param_count}")
return new_df
# return input type data
return make_df(new_df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works ok for data frame based graphs, but we might need to revisit in a follow-up change to make this support tensortable too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think its fine, for tensortable it also works because the executor handles the change to something compatible for the next operator. https://github.com/NVIDIA-Merlin/core/blob/main/merlin/dag/executors.py#L105

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about if a user-defined function returns a TensorColumn type, is that something we'd like to support?

@oliverholworthy
Copy link
Member

It's probably worth adding a test for this situation where we pass a pandas data frame but the user defined function returns a cudf series and/or the other way around

@jperez999
Copy link
Collaborator Author

It's probably worth adding a test for this situation where we pass a pandas data frame but the user defined function returns a cudf series and/or the other way around

Test added.

@oliverholworthy oliverholworthy merged commit cc89d2a into NVIDIA-Merlin:main Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants