Skip to content

Reimplement replicate() using the Active Memory Manager #6578

Open
@crusaderky

Description

@crusaderky

This issue is very tightly related to #4906.

replicate(), as well as its wrapper scatter(..., broadcast=True), have several issues:

Proposed design

Reimplement replicate() on top of the Active Memory Manager.
The command will just start an AMM policy for the involved keys.
The policy will track the keys and, every two seconds, create new replicas if there aren't enough.
Once the keys cease to exist, or the client calls replicate(n=1) on the same keys, the policy detaches itself from the AMM.

The number of desired replicas will be tracked through a replicate: <n> annotation on the involved keys.

Side effects

  • replicate() becomes non-blocking. It won't wait for replication to complete and won't even wait for keys to become in-memory.
  • You don't need to have all target workers online when you invoke replicate. As a matter of fact, replicate(n=inf) will become the default option; it means that if at any point in the future a new worker joins the cluster, the key will be immediately replicated onto it.

Optional additional feature

Alternatively, the client may also annotate the keys directly when building the graph:

a = ...
with dask.annotate(replicate=2):
    b = f(a)
b.compute(optimize_graph=False)

For this to work, when a new key lands on the scheduler, there must be machinery that parses the annotations, detects the replicate tag, and invokes Scheduler.replicate() under the hood.
This can be neatly implemented by a Scheduler plugin.

Proposed contentious breaking chances

  • Do not offer an option to make replicate blocking again
  • Do not offer an option to specify a subset of workers to replicate to (this would make the annotation much simpler)
  • The replicate() command will fail if the AMM is not enabled.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions