Skip to content

Write multiple chunks concurrently, split by topic and/or schema #2220

@mrkbac

Description

@mrkbac

Description

Write multiple chunks concurrently, split by topic and/or schema

More efficient reading

Currently, chunks are created based on the size of uncompressed bytes.
This means that if you have two topics - /large_topic and /small_topic - both publishing at 10 Hz, each chunk will contain messages from both topics.

If you later want to read only /small_topic, you still need to decompress all chunks, even though you only require a small subset of the data.

By splitting chunks by topic, we can place /large_topic and /small_topic into separate overlapping chunks.
This allows reading nodes to decompress only the chunks containing /small_topic, significantly improving read efficiency for selective topic access.

More efficient writing

Currently, chunks that fail to compress by at least 2 % are stored uncompressed.
This often happens for data types that are already compressed - for example, sensor_msgs/msg/CompressedImage.

By splitting chunks by schema, we can separate messages like sensor_msgs/msg/CompressedImage into their own uncompressed chunks,
while grouping all other message types into compressed chunks.

This approach improves overall compression ratios for the compressed data,
while avoiding wasted CPU cycles trying to re-compress already-compressed payloads.

Implementation Notes / Suggestions

  • To prevent data loss a max duration per chunk group could be implemented, after which the chunk group is flushed to disk regardless of size.
  • To prevent excessive RAM usage a max buffer size overall could be implemented, if exceeded the least recently used chunk group is flushed to disk.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions