Skip to content

Support interned strings #36

@seanlinsley

Description

@seanlinsley

In Ruby, the Symbol type is a garbage-collected string type that is allocated once for each unique string. It would be useful to offer similar functionality to pco_store for string fields which are highly duplicated across the rows being stored.

There are a lot of string interning crates available for Rust with different APIs, so it's nontrivial to decide which we should support.

Requirements:

  • must be globally allocated so a state variable doesn't need to be passed around
  • must be garbage collected when no longer referenced
  • should have a convenient API so they're similar in usage to &str and String
  • serde must be supported

In terms of serialization, is there one that internally implements a serialization format that removes duplicates from e.g. a Vec<String> to avoid the large initial allocation when deserializing, or is that something we have to implement ourselves?

We should benchmark the different approaches (both CPU and RAM) to compare the performance improvements versus the added complexity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions