Originally raised by @robfitzgerald:
Feature
multiprocessing is currently being used to parallelize operations, but these operations may actually perform more slowly than synchronous code, due to the fact that:
- we need to slice up dataframes and copy them
- multiprocessing brings it's own runtime performance hits for forking and joining (aka, pool and map) processes
in order to review the performance benefits, we can create a separate workflow based on the presence of a "parallelism" argument where:
- parallelism is None or 1: skip parallelism, do work directly on dataframe
- parallelism greater than 1: use current workflow
Analysis
with these changes, we should run a test for parallelism values to discover whether runtimes are improved.
Originally raised by @robfitzgerald:
Feature
multiprocessing is currently being used to parallelize operations, but these operations may actually perform more slowly than synchronous code, due to the fact that:
in order to review the performance benefits, we can create a separate workflow based on the presence of a "parallelism" argument where:
Analysis
with these changes, we should run a test for parallelism values to discover whether runtimes are improved.