Improve SUB_WORKFLOW reliability, recovery, and scalability#973
Improve SUB_WORKFLOW reliability, recovery, and scalability#973rajeshwar-nu wants to merge 29 commits into
Conversation
|
Hi @vishesh-orkes, @mp-orkes , @nthmost-orkes , can I please get some thoughts on this 🙏🏻 |
99695ae to
e0d362c
Compare
|
@rajeshwar-nu can you address the failing tests otherwise this PR LGTM |
e0d362c to
45c5dff
Compare
After making SUB_WORKFLOW an async system task, many integration
tests need an explicit pop+execute of the SUB_WORKFLOW queue and a
sweep() to advance parent or child workflows that were previously
driven inline by the decide loop.
- HierarchicalForkJoinSubworkflow{Rerun,Restart,Retry}Spec: sweep the
mid-level workflow before polling its integration_task_2 so the
task gets scheduled after the async SUB_WORKFLOW executes.
- DoWhileSpec: pop+execute the SUB_WORKFLOW spawned by the DoWhile
iteration and sweep the resulting subworkflow so its first task
is scheduled.
- ForkJoinSpec: pop+execute the SUB_WORKFLOW that a retry schedules;
sweep the nested subworkflow before asserting on its first task.
- NestedForkJoinSubWorkflowSpec: pop+execute the SUB_WORKFLOW that
restart/retry schedules on the parent workflow.
- SubWorkflow{Rerun,Restart,Retry}Spec: after rerun/restart/retry on
the root or mid-level, pop+execute each newly scheduled
SUB_WORKFLOW and sweep the corresponding child workflow so its
first task is scheduled.
|
@v1r3n thanks for the review, tests are fixed now |
|
I see the sub-workflow ids are pre-fetched. I think an alternative to it might be to accept workflow Id as part of the start workflow request. The ids can be generated using IdGenerator attached to the parent workflow and then start the sub-workflow. This avoids the id reservation logic and keeps things simpler. This also means as a user you can pass the id as part of the start workflow request. I would then go ahead and update IdGenerator to add a validate method that ensures that the user supplied id is in a valid format. |
Generating subworkflow_id in parent workflow does not scale well for dynamic fork-join with many forks (around 1k forks). The idea behind the reservation is to keep things async, including generation of subworkflow-id, scheduling and starting the subworkflow. The reservation can quickly be created, which allows huge dyn fork-join fan-outs to complete quickly (preventing the the leased lock from getting timeout). Otherwise, large no. of forks executing sequentially within the single lease of parent cannot complete in time, leaving the workflow vulnerable to get inconsistent updates by a subsequent decider call |
|
I think we can avoid having the id reservation. Traditionally, ids are blocked and need to be accounted for, in case of conductor each id is a random UUID, so instead of doing reservation, we can just generate an id on demand and attach to a workflow (we will need to update workflow executor to be able to accept id as part of start request, which should only be available internally for sub-workflow flow and not for anything else). This avoids the complexity, and if for some reason if the system fails, the next attempt will generate new id, we don't need to do reservation. How about this approach? |
what about generating them in parallel? we are talking about generating 10K UUIDs. |
|
@v1r3n I changed the approach here. Instead of using a reservation store, the flow is now:
|
Summary
This PR improves the reliability, recovery behavior, and scalability of
SUB_WORKFLOWexecution.It addresses a failure mode where parent workflows can be left with sub-workflow tasks that are persisted but not cleanly attached or recoverable, especially under large fanout, nested sub-workflow creation, or transient queue/persistence instability.
It also reduces the amount of heavy child-workflow startup work performed on the critical parent execution path, which makes
SUB_WORKFLOWorchestration behave better under load.Why
SUB_WORKFLOWis an orchestration primitive, not a worker-polled task.In the previous model:
SUB_WORKFLOWtasks could remain inSCHEDULEDwithout a reliable recovery pathSUB_WORKFLOWtasks could be too slow for prompt repairFor large
dyn fork-join -> subworkflow -> dyn fork-join -> subworkflowworkloads, that made the system more fragile than it needed to be.This PR changes
SUB_WORKFLOWlaunch semantics to better match its actual role:What Changed
Idempotent child launch
Faster and lighter parent attachment
SUB_WORKFLOWlaunch as an async orchestration stepsubWorkflowIdThis reduces parent-path blocking and improves scalability for nested fanout workloads.
Recovery behavior
SCHEDULEDSUB_WORKFLOWtasks withoutsubWorkflowIdto retry launch instead of dead-endingReservation lifecycle management
Faster revisit for active SUB_WORKFLOW tasks
SUB_WORKFLOWtasks dedicated postpone behaviorworkflowOffsetTimeoutfor bothSCHEDULEDandIN_PROGRESSSUB_WORKFLOWtasksThis improves reliability by reducing the time unresolved sub-workflow tasks can sit before being revisited.
Reliability and Scalability Impact
This PR improves reliability by:
SCHEDULEDstatesSUB_WORKFLOWtasks soonerThis PR improves scalability by:
Backward Compatibility
This PR intentionally changes
SUB_WORKFLOWbehavior:SUB_WORKFLOWlaunch now follows the async/idempotent attach model implemented hereWorkflowExecutor.startWorkflowIdempotent(...)now returns aWorkflowModelThis is both a behavioral change and an SPI change, so mixed-version rollout should be treated carefully.
Tests
Added/updated coverage for:
SCHEDULEDtasks withoutsubWorkflowIdSUB_WORKFLOWtasks