Skip to content

Task.Command should not evaluate more than once #6437

@lefou

Description

@lefou

Command tasks are not unique in the graph

When a user defines a command, we currently end up with a command factory producing a new task for each call site. As a result, we can end up with more than one task representing the same command. This is the wanted behavior in case the command arguments are different, but when the arguments are equal, there is no real reason to run the command multiple time. In addition to the wasteful repeated work, there is a hidden concurrency issue, as it is likely that multiple tasks representing the same command will run in parallel, due to the fact that their dependencies are resolved at the time eventually.

In summary, here are the issues:

  1. Command tasks may run concurrently, although they shouldn't as they share the same Task.dest.

  2. Command tasks with identical arguments run multiple times (concurrently), although they should not, as their outcome can be considered identical.

Proposed Solution: re-run avoidance and synchronization

Commands are not cached, but their results are persisted to JSON, so they could be re-used. The executor should track ran commands and should avoid re-running a tasks that represent command with arguments already ran before.

Since commands currently don't require a JSON reader, the results of already ran commands need the be hold in memory and can't be read from JSON (currently), which should not be an issues, since we already hold all task results of a single evaluation run in memory.

In addition, we need to synchronize all tasks that access the same Task.dest, to avoid concurrent access.

Alternative Solution: re-run avoidance and different Task.dest

Instead of synchronizing all tasks, we could give each command task a different Task.dest, that is derived from the arguments. Either as a string representation mapped to sub-dirs (as we do with cross-modules) or as a hash (if the string might get to long, could be configured via a flag). This is somewhat analog to the parametrized task proposal

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions