-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Description
When wait_for_completion=True in EmrCreateJobFlowOperator, the operator code does not actually wait for the cluster to complete successfully before returning a success state. Instead, a success state is returned as soon as the cluster starts running. This can result in the task succeeding even if the cluster is terminated with errors after it begins running.
I believe this is due to this line of code that assigns the "WAIT_FOR_COMPLETION" WaitPolicy to the waiter. This corresponds to the "job_for_waiting" wait policy with which the waiter will only wait for the cluster to start running before returning a success state.
If the user wants the waiter to wait until the cluster completes, WAIT_FOR_STEPS_COMPLETION corresponding to the "job_flow_terminated" wait policy must be used. The operator has hard coded the "job_for_waiting" wait policy, so the user cannot configure the wait policy.
I propose adding a wait_policy parameter to the operator which allows the user to specify which wait policy they would prefer to use.
Use case/motivation
No response
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct