Better handling of large bag files (backport #178) #201
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There were two problems in the codebase:
timeline._update_index_cachewas called per invalidated topic, which resulted in multiple reads of the whole bagfile (each topic seeks the reader to the bag start and forces it to read throughout the whole bag). There might be some caching on the way so that rqt_bag doesn't read 20x12 GB for a 12 GB file with 20 topics, but it did definitely read much more than 12 GB.BagTimeline.get_entries(). This was a smaller problem in the per-topic scenario, but a big problem in the all-topic one. But even in the per-topic scenario, rqt_bag needlessly cached gigabytes of entries (e.g. for an image topic) just to sort them.This PR fixes both issues by utilizing a set of generators, one per bagfile, which produce messages simultaneousely and the earliest of all messages is taken.
I also got the progressbar working much more nicely (however, it only seems to work when loading the first bag).
Tests
The tests were performed on a 12 GB MCAP bag. Before each test, the bag file was evicted from FS cache.
I also wanted to test on a 70 GB bag file, but I didn't have enough hours and RAM to wait until it loads from a USB3 SSD. With this PR, it should be easy to open it (however, I can only verify that tomorrow).
Without this PR
Obrazovkove.vysilani.2025-05-08.13.49.15.mp4
With this PR
Obrazovkove.vysilani.2025-05-08.13.47.10.mp4
The problem mentioned in #166 that clicking on the timeline is super slow on large bags, still remains, however. But that's for a future PR.
This is an automatic backport of pull request #178 done by Mergify.