Skip to content

Add migration tool for legacy .rrd files #9816

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
May 7, 2025
Merged

Add migration tool for legacy .rrd files #9816

merged 26 commits into from
May 7, 2025

Conversation

emilk
Copy link
Member

@emilk emilk commented Apr 28, 2025

What

The migrator supports most 0.22.1 rrd files, and some older ones.

It's not perfect, and we're not gonna keep this code path around forever.

⚠️ WARNING

Work-in-progress! Don't use on your actual files YET

Usage

git clone [email protected]:rerun-io/rerun.git
cd rerun
git checkout emilk/rrd-migrator
cargo run --release rrd migrate foo.rrd

foo.rrd will be renamed to foo.backup.rrd and foo.rrd will be created, migrated.

You can also pass in multiple files:

cargo run --release rrd migrate folder/*.rrd

Once released in 0.23.2, the command will be rerun rrd migrate *.rrd

Known issues with the migrator

  • Relative timestamps will end up interpreted as absolute time stamps (e.g. will show up as 1970-01-01THH:MM:SS)
  • Some data exported by the 0.22 Rust SDK won't load (but Python and C++ should be fine)

Let me know if either of these things is a problem for you.

Problems with the 0.22 Rust SDK

It seems like data exported by arrow2 (used by the 0.22 Rust SDK) would sometimes mark some types as nullable=false, but then include a null-buffer with all nulls. arrow-rs (which we now use) will then bail saying something like “error: non-nullable type has nulls”.

Since Rust SDK usage is relatively low, I'm down-prioritizing this right now.

State of the PR

The PR adds in full support for loading pre-0.23 files with the viewer.

In order to lessen our maintenance burden, we may decide to drop support for the pre-0.23 rrd files when we release Rerun 0.24 (even in the migrator). Users can still install Rerun 0.23.2 to migrate their old .rrd files to 0.23, which will then be compatible with all future Rerun version.

I think the added complexity in the decoder is justified by making our users lives better, and its a complexity we can remove right after releasing 0.23.2, or whenever we are sufficiently annoyed by it.

Copy link

github-actions bot commented Apr 28, 2025

Web viewer built successfully. If applicable, you should also test it:

  • I have tested the web viewer
Result Commit Link Manifest
993bc43 https://rerun.io/viewer/pr/9816 +nightly +main

Note: This comment is updated whenever you push a commit.

@emilk emilk changed the title Revert "remove rmp-serde dependency" Add back support for loading legacy .rrd files Apr 28, 2025
@emilk emilk added do-not-merge Do not merge this PR include in changelog CLI Related to the Rerun CLI labels Apr 28, 2025
@emilk emilk added this to the 0.23.2 milestone Apr 28, 2025
@emilk emilk changed the title Add back support for loading legacy .rrd files Support for loading legacy .rrd files Apr 28, 2025
@jprochazk
Copy link
Member

I think we shouldn't allow loading 0.22 rrds at all. That sends the wrong message. Only the migrate command should be able to load them.

Can we do this in a way that doesn't affect the rest of the codebase? We removed these codepaths in the protobuf migration, because it would've been really annoying to keep them around. For example, we could duplicate the msgpack decoder in the migration tool.

@emilk emilk changed the title Support for loading legacy .rrd files Add migration tool for legacy .rrd files Apr 29, 2025
@emilk emilk force-pushed the emilk/rrd-migrator branch from 9833ee8 to 7a8879b Compare April 29, 2025 18:19
@emilk emilk changed the base branch from release-0.23.1 to main April 29, 2025 18:19
@emilk emilk removed the do-not-merge Do not merge this PR label Apr 29, 2025
@emilk emilk marked this pull request as ready for review April 29, 2025 18:32
Copy link
Member

@jprochazk jprochazk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am opposed to putting the old serialization code back into re_log_encoding. Can we either merge this directly into a release-0.23.2 branch so that it doesn't affect main (I saw that was the case before - why change it?), or do it in a way that doesn't affect the rest of the codebase, even if we have to duplicate relevant code into another place?

@emilk
Copy link
Member Author

emilk commented Apr 30, 2025

Why are you opposed? I don't see it adding to our maintenance burden. It adds very little code to a piece of the codebase that we rarely change.

There are other fixes (improved re_sorbet migration, improved error messages etc) in this branch that I want landed on main, which is why I'm pointing this PR to main.

Sure, with some extra work we can split the PR in two, but I want a good motivation for that extra work.

@jprochazk
Copy link
Member

Why are you opposed? I don't see it adding to our maintenance burden. It adds very little code to a piece of the codebase that we rarely change.

It is a lot of code. In re_log_encoding alone, this PR is +671 dense lines. I removed the msgpack stuff during the protobuf migration because of how entangled it was with the code, changes unrelated to msgpack ended up requiring some change to msgpack anyway, which made it unnecessarily difficult to work with.

I also feel like the code changes almost every week. That almost holds up, according to the crate history, the longest period between 2 commits in the last few months was 20 days between december and january (winter holiday). Other than that I'm seeing bursts of commits with at most 1 or 2 weeks inbetween. I've got pending changes to it right now (#9826), and there is more work lined up (depending on priorities, unlikely to happen soon to be fair), like getting rid of LogMsg.

@jprochazk
Copy link
Member

I can only provide "negative" motivation, which is that the msgpack code will be removed again. Maybe not immediately, but definitely before the 0.24 release. It would be a lot easier if we didn't have to do that, and instead merged it straight into an 0.23.2 release.

@pablovela5620
Copy link
Contributor

On my first conversion attempt using https://huggingface.co/datasets/pablovela5620/rrd-examples/blob/main/pda-example.rrd

I managed to get a technically successful conversion, but sadly got this error when trying to view

image

@jprochazk
Copy link
Member

@pablovela5620

I managed to get a technically successful conversion, but sadly got this error when trying to view

If you run the same file through the migration command again (deleting the .backup.rrd file first so that it actually goes through), and then attempt to load it... Does that work?

wget https://huggingface.co/datasets/pablovela5620/rrd-examples/resolve/main/pda-example.rrd?download=true -O pda-example.rrd

pixi run rerun rrd migrate pda-example.rrd
uv run --with rerun-sdk==0.23.1 rerun pda-example.rrd

pixi run rerun rrd migrate pda-example.rrd
uv run --with rerun-sdk==0.23.1 rerun pda-example.rrd

@jprochazk
Copy link
Member

jprochazk commented May 2, 2025

Nevermind, I pushed a commit that should fix it. It turns out in the first pass we only migrated msgpack -> protobuf, and only after loading the data as protobuf would it attempt to migrate record batches. And because you were running on 0.23.1, it wouldn't do that... because the code to migrate the data only exists on this branch.

So now the migrate command does both msgpack->protobuf and migrates arrow record batches (by going through re_sorbet like the viewer). A 0.21.0 recording after migration should now load in 0.23+

@jprochazk jprochazk added the do-not-merge Do not merge this PR label May 5, 2025
@jprochazk jprochazk changed the title Add migration tool for legacy .rrd files [DO NOT MERGE] Add migration tool for legacy .rrd files May 5, 2025
@jprochazk
Copy link
Member

Adding DNM here, as it's not ready to be merged into main. Merging into release-0.23.2 in a separate branch: #9879

@emilk emilk removed this from the 0.23.2 milestone May 5, 2025
@pablovela5620
Copy link
Contributor

Tried the latest commit on this branch and at least for the 0.21 version it migrated correctly!

@jprochazk jprochazk force-pushed the emilk/rrd-migrator branch from 0d580f4 to 018b863 Compare May 7, 2025 11:24
@jprochazk jprochazk changed the title [DO NOT MERGE] Add migration tool for legacy .rrd files Add migration tool for legacy .rrd files May 7, 2025
@jprochazk jprochazk removed the do-not-merge Do not merge this PR label May 7, 2025
@jprochazk jprochazk dismissed their stale review May 7, 2025 11:27

re_log_encoding was reverted

@jprochazk jprochazk merged commit 1bdb4b7 into main May 7, 2025
37 checks passed
@jprochazk jprochazk deleted the emilk/rrd-migrator branch May 7, 2025 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLI Related to the Rerun CLI include in changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants