-
Notifications
You must be signed in to change notification settings - Fork 448
Add migration tool for legacy .rrd files #9816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Web viewer built successfully. If applicable, you should also test it:
Note: This comment is updated whenever you push a commit. |
I think we shouldn't allow loading 0.22 rrds at all. That sends the wrong message. Only the migrate command should be able to load them. Can we do this in a way that doesn't affect the rest of the codebase? We removed these codepaths in the protobuf migration, because it would've been really annoying to keep them around. For example, we could duplicate the msgpack decoder in the migration tool. |
9833ee8
to
7a8879b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am opposed to putting the old serialization code back into re_log_encoding
. Can we either merge this directly into a release-0.23.2
branch so that it doesn't affect main
(I saw that was the case before - why change it?), or do it in a way that doesn't affect the rest of the codebase, even if we have to duplicate relevant code into another place?
Why are you opposed? I don't see it adding to our maintenance burden. It adds very little code to a piece of the codebase that we rarely change. There are other fixes (improved re_sorbet migration, improved error messages etc) in this branch that I want landed on main, which is why I'm pointing this PR to main. Sure, with some extra work we can split the PR in two, but I want a good motivation for that extra work. |
It is a lot of code. In I also feel like the code changes almost every week. That almost holds up, according to the crate history, the longest period between 2 commits in the last few months was 20 days between december and january (winter holiday). Other than that I'm seeing bursts of commits with at most 1 or 2 weeks inbetween. I've got pending changes to it right now (#9826), and there is more work lined up (depending on priorities, unlikely to happen soon to be fair), like getting rid of LogMsg. |
I can only provide "negative" motivation, which is that the msgpack code will be removed again. Maybe not immediately, but definitely before the 0.24 release. It would be a lot easier if we didn't have to do that, and instead merged it straight into an 0.23.2 release. |
On my first conversion attempt using https://huggingface.co/datasets/pablovela5620/rrd-examples/blob/main/pda-example.rrd I managed to get a technically successful conversion, but sadly got this error when trying to view ![]() |
If you run the same file through the migration command again (deleting the
|
Nevermind, I pushed a commit that should fix it. It turns out in the first pass we only migrated msgpack -> protobuf, and only after loading the data as protobuf would it attempt to migrate record batches. And because you were running on 0.23.1, it wouldn't do that... because the code to migrate the data only exists on this branch. So now the migrate command does both msgpack->protobuf and migrates arrow record batches (by going through |
Adding DNM here, as it's not ready to be merged into |
Tried the latest commit on this branch and at least for the 0.21 version it migrated correctly! |
0d580f4
to
018b863
Compare
What
The migrator supports most 0.22.1 rrd files, and some older ones.
It's not perfect, and we're not gonna keep this code path around forever.
Work-in-progress! Don't use on your actual files YET
Usage
foo.rrd
will be renamed tofoo.backup.rrd
andfoo.rrd
will be created, migrated.You can also pass in multiple files:
Once released in 0.23.2, the command will be
rerun rrd migrate *.rrd
Known issues with the migrator
Let me know if either of these things is a problem for you.
Problems with the 0.22 Rust SDK
It seems like data exported by arrow2 (used by the 0.22 Rust SDK) would sometimes mark some types as nullable=false, but then include a null-buffer with all nulls. arrow-rs (which we now use) will then bail saying something like “error: non-nullable type has nulls”.
Since Rust SDK usage is relatively low, I'm down-prioritizing this right now.
State of the PR
The PR adds in full support for loading pre-0.23 files with the viewer.
In order to lessen our maintenance burden, we may decide to drop support for the pre-0.23 rrd files when we release Rerun 0.24 (even in the migrator). Users can still install Rerun 0.23.2 to migrate their old .rrd files to 0.23, which will then be compatible with all future Rerun version.
I think the added complexity in the decoder is justified by making our users lives better, and its a complexity we can remove right after releasing 0.23.2, or whenever we are sufficiently annoyed by it.