-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
I was running a workflow using Afar on Coiled, and I noticed that the Afar version at a moment had a worker that stopped receiving tasks. Notice that in the task stream on the performance reports the Afar version, the last thread stops having tasks while in the non-afar version this doesn't happen. Is this the expected behavior? what is actually happening in here?
Note: the data is public so this should work as a reproducible example.
Workflow without afar:
ddf = dd.read_parquet(
"s3://coiled-datasets/timeseries/20-years/parquet",
storage_options={"anon": True, "use_ssl": True},
split_row_groups=True,
engine="pyarrow",
)
with performance_report(filename="read_pq_groupby_mean_CPU_pyarrow.html"):
ddf.groupby('name').x.mean().compute()Workflow with afar
%%time
with afar.run, remotely:
ddf_cpu = dd.read_parquet(
"s3://coiled-datasets/timeseries/20-years/parquet",
storage_options={"anon": True, "use_ssl": True},
split_row_groups=True,
engine="pyarrow",
)
res = ddf_cpu.groupby('name').x.mean().compute()
with performance_report(filename="read_pq_groupby_mean_CPU_pyarrow_afar.html"):
res.result()Metadata
Metadata
Assignees
Labels
No labels