Skip to content

Reconsider Value implementation #437

@ed255

Description

@ed255

Related to #409

#409 proposes refactoring the Value implementation to better use Arc so that using the underlying types is more ergonomic and cloning all kinds of Value becomes cheap because the inner data is under Arc.

This solves ergonomic problems and cloning problems; but we still have a remaining problem in serialization and storage.
The benefit of the Arc disappears once we serialize and deserialize (either for sending over the network or storing to disk), because upon deserialization we don't deduplicate Arced things (or skip serialization when the same value under the Arc was already serialized).

I would like to propose a redesign where in Value we only keep references to "big" types, and store the data in a shared structure.
For example, we could just store the value type and the raw_value; and then have a shared map of raw_value to the String, Dict, Set, ....

Then cloning a Value would be cheap because it would just copy the raw_value and type and reuse the entry in the shared map.

Challenges:

  • Serializing and deserializing become more tricky (but more optimal). First we'd serialize all structures with the value references (without dereference) while noting the references that have been seen. Then we'd serialize all the associated data of seen references during serialization.
  • Garbage collection: we may need to implement a reference counting mechanism to this shared map so that we can delete entries if no reference is using them
  • Error handling: where do we keep this shared map? Is it a parameter that we need to pass everywhere? Or does each value keep a pointer to this map? If it's the second case, what happens if we mix values that use different maps?

We could also use this shared map to store custom predicate batches.

Note that this is a big refactor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestrefactorRefactoring task that may be left for the appropriate time

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions