Skip to content

Add explicit cloning to query2 #119

Open
@johan-bjareholt

Description

@johan-bjareholt

Problem

Currently every time we reference a variable we are cloning the whole value to guarantee that the value of the variable cannot be modified in some function without re-assigning it.
If we had explicit cloning built-in to the language so cloning is only done when necessary querying would probably be insanely faster.

The python implementation seems to be inconsistent as some transforms consumes the data and others don't and the query2 language doesn't seem to clone variables so things can get very complicated. Will have to investigate further on this though.

Suggestions on how it could be solved

1. New default is to assume mutable reference

clone(data) returns cloned data
*data to specify that variable will be consumed and undefined after this. Needed to avoid unnecessary cloning
Will not work because here the same value can be in multiple variables, breaks ownership rules.

events1 = query_bucket("bucket");
events2 = filter_keyvals(events1, "key", ["val"]);
# OK, BUT this will modify events1 so it might be unclear to the user what is occuring
events3 = filter_keyvals(clone(events1), "key", ["val"]);
# OK, at this point both events1 and events2 are defined
events4 = filter_keyvals(*events1, "key", ["val"]);
# OK, at this point events1 is undefined as it was consumed by *events1

2. New default is to assume immutable reference

clone(data) returns cloned data
*data to specify that variable will be consumed and undefined after this. Needed to avoid unnecessary cloning.
Will not work because here the same value can be in multiple variables, breaks ownership rules.

events1 = query_bucket("bucket");
events2 = filter_keyvals(events1, "key", ["val"]);
# Error: Not possible to pass events1 into filter_keyvals as it needs mutable data
events3 = filter_keyvals(clone(events1), "key", ["val"]);
# OK, at this point both events1 and events2 are defined
events4 = filter_keyvals(*events1, "key", ["val"]);
# OK, at this point events1 is undefined as it was consumed by *events1

3. New default is to assume consume

clone(data) returns cloned data
&data to specify that variable will be used as a reference

events1 = query_bucket("bucket");
events2 = filter_keyvals(events1, "key", ["val"]);
# OK as it is consumed by default, but after this line it will be consumed so we need to either fist clone it or query the bucket again
events1 = query_bucket("bucket");
events3 = filter_keyvals(clone(events1), "key", ["val"]);
# OK, at this point both events1 and events2 are defined
events4 = filter_keyvals(*events1, "key", ["val"]);
# OK, at this point events1 is undefined as it was consumed by *events1

4. Keep as is, but introduce some pre-processing step which does reference counting before executing

Since we don't have any loops or non-transform function support this would be possible to do, but still a bit messy to get working. The result might not be perfect either.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions