Description
If there are more tasks than the task stream length limit (distributed.scheduler.dashboard.tasks.task-stream-length
), then I think the calculation of "number of tasks", "compute time", etc. would be an underestimate, because the deque of tasks would roll over. When generating the performance report, we're just summing up data from the task stream:
distributed/distributed/scheduler.py
Lines 7327 to 7335 in 51a63ea
I'd propose that in Scheduler.performance_report
, if total_tasks == self.plugins[TaskStreamPlugin.name].buffer.maxlen
(or something like that, but written more nicely), we just prepend add a >=
to every value in the performance report ("number of tasks: >=100000", "compute time: >= 123456s", etc.).