-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Related to #409
#409 proposes refactoring the Value implementation to better use Arc so that using the underlying types is more ergonomic and cloning all kinds of Value becomes cheap because the inner data is under Arc.
This solves ergonomic problems and cloning problems; but we still have a remaining problem in serialization and storage.
The benefit of the Arc disappears once we serialize and deserialize (either for sending over the network or storing to disk), because upon deserialization we don't deduplicate Arced things (or skip serialization when the same value under the Arc was already serialized).
I would like to propose a redesign where in Value we only keep references to "big" types, and store the data in a shared structure.
For example, we could just store the value type and the raw_value; and then have a shared map of raw_value to the String, Dict, Set, ....
Then cloning a Value would be cheap because it would just copy the raw_value and type and reuse the entry in the shared map.
Challenges:
- Serializing and deserializing become more tricky (but more optimal). First we'd serialize all structures with the value references (without dereference) while noting the references that have been seen. Then we'd serialize all the associated data of seen references during serialization.
- Garbage collection: we may need to implement a reference counting mechanism to this shared map so that we can delete entries if no reference is using them
- Error handling: where do we keep this shared map? Is it a parameter that we need to pass everywhere? Or does each value keep a pointer to this map? If it's the second case, what happens if we mix values that use different maps?
We could also use this shared map to store custom predicate batches.
Note that this is a big refactor.