Skip to content

parallel unique #35

@CJ-Wright

Description

@CJ-Wright

We need a parallel friendly unique node.

Something like:

  1. data0 comes in
  2. node caches future (dataC) and emits future
  3. data1 comes in
  4. node submits a function on data1 and dataC to the client. The function evaluates the values of the two futures and returns either the output of data1 or a not-unique sentinel.

I can currently see two classes of use cases for this:

  1. The "map" use case. In this use case we don't want to operate on non-unique data (maybe because the operation is expensive). Therefore the not-unique sentinel acts like a null-compute sentinel.
  2. The "join" use case. In this use case we want to join new data with data which is unique. We can't just pass a null-compute here since we would then cause all downstream nodes to run a null-compute. Instead we want to join with the latest outcome which was unique. Note that this is different than a null-compute join, which does exist: for instance if the prior node was a filter node.

For "map" like nodes we need the sentinel to tell us if we should bother computing the outcome. For "join" like nodes we need the sentinel to report the most recent unique future. Note that we can't have the "map" like nodes resolve to a null-compute since this would cause issues if a map and join nodes were in the same branch.

This could also be problematic for zip joins, since those will need to potentially pass down null-computes since they are picky about when things come in.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions