Skip to content

Race condition in WorkflowJob output callback when using MultithreadedJobExecutor #2068

Open
@AlexTate

Description

@AlexTate

Continuation of #2003

Expected Behavior

Scatter outputs should be collected only once for each scatter job.

Actual Behavior

The patch provided by @GlassOfWhiskey in PR #2051 has done well to ensure the expected behavior, but as they note, it doesn't address the root cause. Under certain conditions, ReceiveScatterOutput.receive_scatter_output() may still be called twice within a narrow window of time for the same scatter job output.

The root cause lies within WorkflowJob and the conditions it uses to determine if .do_output_callback() should be called. This method is intended to be called from either WorkflowJob.receive_output() OR WorkflowJob.job(), but during the race condition this method is called from both.

WorkflowJob.job() runs in one thread while WorkflowJob.receive_output() is the callback bound to a work unit being executed by one of the TaskQueue workers, i.e. it executes in a separate scatter job thread. The race condition is that .receive_output() thread calls do_output_callback() (which later sets self.did_callback = True) and while in the body of that method, the .job() thread queries the value of did_callback which is still False, so it also calls do_output_callback().

The relevant shared state is:

  • WorkflowJob.did_callback
  • WorkflowJob.steps where completed==True

I wanted to be sure that both methods were executing the same callback, which can be a little tricky with all of the nested functools partials that obscure the object to which each level is bound. If you unwind the callback chain you'll see they are the same in both branches:

WorkflowJob.receive_output()  # [workflow scatterletters_#]
	WorkflowStep.receive_output()  # [simple-simple-scatter.cwl#scatterletters]
		ReceiveScatterOutput.receive_scatter_output()
			WorkflowJob.receive_output()  # [workflow ]
				MultithreadedJobExecutor.output_callback()
WorkflowJob.job()  # [workflow scatterletters_#]
	WorkflowStep.receive_output()  # [simple-simple-scatter.cwl#scatterletters]
		ReceiveScatterOutput.receive_scatter_output()
			WorkflowJob.receive_output()  # [workflow ]
				MultithreadedJobExecutor.output_callback()

Workflow Code

See #2003

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions