Skip to content

When using Afar one worker stops getting tasks. #32

@ncclementi

Description

@ncclementi

I was running a workflow using Afar on Coiled, and I noticed that the Afar version at a moment had a worker that stopped receiving tasks. Notice that in the task stream on the performance reports the Afar version, the last thread stops having tasks while in the non-afar version this doesn't happen. Is this the expected behavior? what is actually happening in here?

Note: the data is public so this should work as a reproducible example.

Workflow without afar:

ddf = dd.read_parquet(
    "s3://coiled-datasets/timeseries/20-years/parquet",
    storage_options={"anon": True, "use_ssl": True},
    split_row_groups=True,
    engine="pyarrow",
)

with performance_report(filename="read_pq_groupby_mean_CPU_pyarrow.html"):
    ddf.groupby('name').x.mean().compute()

Link to performance report

Workflow with afar

%%time
with afar.run, remotely:
    ddf_cpu = dd.read_parquet(
        "s3://coiled-datasets/timeseries/20-years/parquet",
        storage_options={"anon": True, "use_ssl": True},
        split_row_groups=True,
        engine="pyarrow",
        )
    
    res = ddf_cpu.groupby('name').x.mean().compute()

with performance_report(filename="read_pq_groupby_mean_CPU_pyarrow_afar.html"):
    res.result()

Link to performance report

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions