Skip to content

Make parallelism optional #19

@dan-mccabe

Description

@dan-mccabe

Originally raised by @robfitzgerald:

Feature

multiprocessing is currently being used to parallelize operations, but these operations may actually perform more slowly than synchronous code, due to the fact that:

  • we need to slice up dataframes and copy them
  • multiprocessing brings it's own runtime performance hits for forking and joining (aka, pool and map) processes

in order to review the performance benefits, we can create a separate workflow based on the presence of a "parallelism" argument where:

  • parallelism is None or 1: skip parallelism, do work directly on dataframe
  • parallelism greater than 1: use current workflow

Analysis

with these changes, we should run a test for parallelism values to discover whether runtimes are improved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions