Skip to content

Handle host frames in serialization #5309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 31, 2020

Conversation

jakirkham
Copy link
Member

Depends on PR ( #5308 )

As some serialization strategies may rely on some or all frames on host, add logic to Serializable to track which frames are on host or device and ensure they are handled appropriately. This should help cases like pack/unpack ( #5025 ) and allow Serializable to be used by other objects using host frames like those in FIL ( rapidsai/cuml#2263 ).

Note: Adding this as a draft for now, though am not aware of anything missing from it (unless reviewers spot something ;)

cc @shwina @hcho3

As some serialization strategies may rely on some or all frames on host,
add logic to `Serializable` to track which frames are on host or device
and ensure they are handled appropriately.
@jakirkham jakirkham added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. labels May 28, 2020
@jakirkham jakirkham marked this pull request as ready for review May 28, 2020 21:26
@jakirkham jakirkham requested a review from a team as a code owner May 28, 2020 21:26
@jakirkham jakirkham changed the title WIP: Handle host frames in serialization Handle host frames in serialization May 28, 2020
@jakirkham jakirkham added 4 - Needs cuDF (Python) Reviewer and removed 2 - In Progress Currently a work in progress labels May 28, 2020
Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shwina
Copy link
Contributor

shwina commented May 28, 2020

FWIW the pickle failure you're seeing here is resolved in the downstream PR #5025; although I'm not clear on why it comes up here.

@jakirkham
Copy link
Member Author

Thanks! Yeah was looking at that as well. It looks like you were able to drop serialize/deserialize from DataFrame. Wonder if there is something missing in those methods.

@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs cuDF (Python) Reviewer labels May 29, 2020
@jakirkham
Copy link
Member Author

After looking more closely at the test failures here, I think this is the same issue we ran into yesterday. IOW the pickle test is assuming approximate equality of the size of the pickled data to the size of the Python object. However pickling will always tack in more information for the unpickling process. So we should just be checking that the pickled data is at least as big as the original data (though most likely bigger). Submitted PR ( #5334 ), which grabs your change from PR ( #5025 ). Hopefully once that is in we will clear the failure here.

@jakirkham
Copy link
Member Author

Merged in branch-0.15 with the pickle test fix. Let's see if that resolves it.

@kkraus14 kkraus14 merged commit 8209b96 into rapidsai:branch-0.15 May 31, 2020
@jakirkham jakirkham deleted the hdl_host_frames_ser branch May 31, 2020 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants