-
Notifications
You must be signed in to change notification settings - Fork 287
Description
Description
Write multiple chunks concurrently, split by topic and/or schema
More efficient reading
Currently, chunks are created based on the size of uncompressed bytes.
This means that if you have two topics - /large_topic and /small_topic - both publishing at 10 Hz, each chunk will contain messages from both topics.
If you later want to read only /small_topic, you still need to decompress all chunks, even though you only require a small subset of the data.
By splitting chunks by topic, we can place /large_topic and /small_topic into separate overlapping chunks.
This allows reading nodes to decompress only the chunks containing /small_topic, significantly improving read efficiency for selective topic access.
More efficient writing
Currently, chunks that fail to compress by at least 2 % are stored uncompressed.
This often happens for data types that are already compressed - for example, sensor_msgs/msg/CompressedImage.
By splitting chunks by schema, we can separate messages like sensor_msgs/msg/CompressedImage into their own uncompressed chunks,
while grouping all other message types into compressed chunks.
This approach improves overall compression ratios for the compressed data,
while avoiding wasted CPU cycles trying to re-compress already-compressed payloads.
Implementation Notes / Suggestions
- To prevent data loss a max duration per chunk group could be implemented, after which the chunk group is flushed to disk regardless of size.
- To prevent excessive RAM usage a max buffer size overall could be implemented, if exceeded the least recently used chunk group is flushed to disk.