Improve SUB_WORKFLOW reliability, recovery, and scalability by rajeshwar-nu · Pull Request #973 · conductor-oss/conductor

rajeshwar-nu · 2026-04-04T03:21:44Z

Summary

This PR improves the reliability, recovery behavior, and scalability of SUB_WORKFLOW execution.

It addresses a failure mode where parent workflows can be left with sub-workflow tasks that are persisted but not cleanly attached or recoverable, especially under large fanout, nested sub-workflow creation, or transient queue/persistence instability.

It also reduces the amount of heavy child-workflow startup work performed on the critical parent execution path, which makes SUB_WORKFLOW orchestration behave better under load.

Why

SUB_WORKFLOW is an orchestration primitive, not a worker-polled task.

In the previous model:

child launch and parent attachment were too tightly coupled
retries did not have a stable child identity to reattach to
partially launched SUB_WORKFLOW tasks could remain in SCHEDULED without a reliable recovery path
parent/child attachment could lag behind child creation
revisit timing for active SUB_WORKFLOW tasks could be too slow for prompt repair
large dynamic fanout of nested sub-workflows put too much synchronous pressure on orchestration

For large dyn fork-join -> subworkflow -> dyn fork-join -> subworkflow workloads, that made the system more fragile than it needed to be.

This PR changes SUB_WORKFLOW launch semantics to better match its actual role:

durable parent-owned child identity
safe retry and reattach
faster parent attachment
quicker revisit for unresolved active sub-workflow tasks
less synchronous orchestration pressure on the parent path

What Changed

Idempotent child launch

reserve a stable child workflow id per owning parent workflow task
reuse the same child id across retries instead of risking duplicate child creation
use execution-store truth for child existence checks instead of index fallback

Faster and lighter parent attachment

treat SUB_WORKFLOW launch as an async orchestration step
create or reattach the child workflow and attach the parent task as soon as the child record exists
avoid waiting for the child workflow’s initial inline expansion before persisting subWorkflowId

This reduces parent-path blocking and improves scalability for nested fanout workloads.

Recovery behavior

allow SCHEDULED SUB_WORKFLOW tasks without subWorkflowId to retry launch instead of dead-ending
preserve reattach behavior after partial persistence failures
make launch failures explicit on the task instead of leaving an ambiguous blank scheduled state

Reservation lifecycle management

add owned reservation cleanup for cancel/delete flows
support both single-task reservation removal and bulk workflow-owned cleanup
store Redis reservations in a workflow-owned hash for cheaper lookup and cleanup

Faster revisit for active SUB_WORKFLOW tasks

give active SUB_WORKFLOW tasks dedicated postpone behavior
use workflowOffsetTimeout for both SCHEDULED and IN_PROGRESS SUB_WORKFLOW tasks
avoid inheriting generic worker-oriented postpone behavior for orchestration tasks

This improves reliability by reducing the time unresolved sub-workflow tasks can sit before being revisited.

Reliability and Scalability Impact

This PR improves reliability by:

making child launch retry-safe
making partial launch/attach failures recoverable
reducing ambiguous SCHEDULED states
revisiting unresolved SUB_WORKFLOW tasks sooner

This PR improves scalability by:

reducing heavy inline child startup work on the parent path
shortening the parent/child attachment gap
making nested fanout workloads less sensitive to transient backend or queue issues

Backward Compatibility

This PR intentionally changes SUB_WORKFLOW behavior:

SUB_WORKFLOW launch now follows the async/idempotent attach model implemented here
WorkflowExecutor.startWorkflowIdempotent(...) now returns a WorkflowModel
persistence implementations must support sub-workflow id reservation APIs

This is both a behavioral change and an SPI change, so mixed-version rollout should be treated carefully.

Tests

Added/updated coverage for:

idempotent create-vs-reattach behavior
execution-store-only existence checks
transient retry for SCHEDULED tasks without subWorkflowId
faster parent attach after child creation
reservation cleanup on task/workflow deletion
backend reservation behavior across persistence implementations
postpone behavior for active SUB_WORKFLOW tasks

rajeshwar-nu · 2026-04-09T06:56:45Z

Hi @vishesh-orkes, @mp-orkes , @nthmost-orkes , can I please get some thoughts on this 🙏🏻
Thanks

v1r3n · 2026-04-20T06:46:35Z

@rajeshwar-nu can you address the failing tests otherwise this PR LGTM

After making SUB_WORKFLOW an async system task, many integration tests need an explicit pop+execute of the SUB_WORKFLOW queue and a sweep() to advance parent or child workflows that were previously driven inline by the decide loop. - HierarchicalForkJoinSubworkflow{Rerun,Restart,Retry}Spec: sweep the mid-level workflow before polling its integration_task_2 so the task gets scheduled after the async SUB_WORKFLOW executes. - DoWhileSpec: pop+execute the SUB_WORKFLOW spawned by the DoWhile iteration and sweep the resulting subworkflow so its first task is scheduled. - ForkJoinSpec: pop+execute the SUB_WORKFLOW that a retry schedules; sweep the nested subworkflow before asserting on its first task. - NestedForkJoinSubWorkflowSpec: pop+execute the SUB_WORKFLOW that restart/retry schedules on the parent workflow. - SubWorkflow{Rerun,Restart,Retry}Spec: after rerun/restart/retry on the root or mid-level, pop+execute each newly scheduled SUB_WORKFLOW and sweep the corresponding child workflow so its first task is scheduled.

…k-join-sw

rajeshwar-nu · 2026-04-21T07:57:06Z

@v1r3n thanks for the review, tests are fixed now

v1r3n · 2026-04-21T11:00:28Z

I see the sub-workflow ids are pre-fetched. I think an alternative to it might be to accept workflow Id as part of the start workflow request. The ids can be generated using IdGenerator attached to the parent workflow and then start the sub-workflow. This avoids the id reservation logic and keeps things simpler. This also means as a user you can pass the id as part of the start workflow request. I would then go ahead and update IdGenerator to add a validate method that ensures that the user supplied id is in a valid format.

rajeshwar-nu · 2026-04-21T12:09:53Z

I see the sub-workflow ids are pre-fetched. I think an alternative to it might be to accept workflow Id as part of the start workflow request. The ids can be generated using IdGenerator attached to the parent workflow and then start the sub-workflow. This avoids the id reservation logic and keeps things simpler. This also means as a user you can pass the id as part of the start workflow request. I would then go ahead and update IdGenerator to add a validate method that ensures that the user supplied id is in a valid format.

Generating subworkflow_id in parent workflow does not scale well for dynamic fork-join with many forks (around 1k forks). The idea behind the reservation is to keep things async, including generation of subworkflow-id, scheduling and starting the subworkflow. The reservation can quickly be created, which allows huge dyn fork-join fan-outs to complete quickly (preventing the the leased lock from getting timeout). Otherwise, large no. of forks executing sequentially within the single lease of parent cannot complete in time, leaving the workflow vulnerable to get inconsistent updates by a subsequent decider call

v1r3n · 2026-04-26T17:03:21Z

I think we can avoid having the id reservation. Traditionally, ids are blocked and need to be accounted for, in case of conductor each id is a random UUID, so instead of doing reservation, we can just generate an id on demand and attach to a workflow (we will need to update workflow executor to be able to accept id as part of start request, which should only be available internally for sub-workflow flow and not for anything else).

This avoids the complexity, and if for some reason if the system fails, the next attempt will generate new id, we don't need to do reservation.

How about this approach?

v1r3n · 2026-05-02T08:57:59Z

I see the sub-workflow ids are pre-fetched. I think an alternative to it might be to accept workflow Id as part of the start workflow request. The ids can be generated using IdGenerator attached to the parent workflow and then start the sub-workflow. This avoids the id reservation logic and keeps things simpler. This also means as a user you can pass the id as part of the start workflow request. I would then go ahead and update IdGenerator to add a validate method that ensures that the user supplied id is in a valid format.

Generating subworkflow_id in parent workflow does not scale well for dynamic fork-join with many forks (around 1k forks). The idea behind the reservation is to keep things async, including generation of subworkflow-id, scheduling and starting the subworkflow. The reservation can quickly be created, which allows huge dyn fork-join fan-outs to complete quickly (preventing the the leased lock from getting timeout). Otherwise, large no. of forks executing sequentially within the single lease of parent cannot complete in time, leaving the workflow vulnerable to get inconsistent updates by a subsequent decider call

what about generating them in parallel? we are talking about generating 10K UUIDs.

rajeshwar-nu · 2026-05-06T09:39:15Z

@v1r3n I changed the approach here. Instead of using a reservation store, the flow is now:

When a SUB_WORKFLOW task is scheduled, Conductor persists it as SCHEDULED.
The SUB_WORKFLOW task is pushed to the system task queue for async processing.
The async system task worker picks up the task and starts the child workflow through an idempotent start path.
The child workflow id is deterministic for the parent task attempt: parentWorkflowId + parentTaskId + retryCount. It is still formatted as a UUID, so storage/backend compatibility is preserved.
Because the child id is deterministic, retries/restarts for the same parent task attempt target the same child workflow id. If the child already exists, the idempotent starter reuses it; if it does not exist, it creates it.
The parent SUB_WORKFLOW task is only attached to the child workflow id after the child workflow record exists, avoiding a long UI/API window where the parent points to a non-existent child.

rajeshwar-nu marked this pull request as ready for review April 4, 2026 07:25

rajeshwar-nu changed the title ~~Make sub-workflow launch recoverable for dyn fork-join fanout~~ Make sub-workflow launches durable for dyn fork-join fanouts Apr 4, 2026

rajeshwar-nu changed the title ~~Make sub-workflow launches durable for dyn fork-join fanouts~~ Improve SUB_WORKFLOW reliability, recovery, and scalability Apr 5, 2026

v1r3n requested a review from mp-orkes April 7, 2026 01:43

rajeshwar-nu force-pushed the improvments/dyn-fork-join-sw branch 2 times, most recently from 99695ae to e0d362c Compare April 20, 2026 06:03

Improve SUB_WORKFLOW launches for dyn fork-joins

45c5dff

rajeshwar-nu force-pushed the improvments/dyn-fork-join-sw branch from e0d362c to 45c5dff Compare April 20, 2026 07:32

rajeshwar-nu added 7 commits April 20, 2026 13:55

Fixes failing tests

f1a617d

Fixes failing tests

c25c9c6

Fix async subworkflow retry decider

11f1578

Fixes failing tests

e150f98

Merge remote-tracking branch 'upstream/main' into improvments/dyn-for…

c0e14b0

…k-join-sw

merge

c3d0c78

rajeshwar-nu added 9 commits April 21, 2026 17:42

merge

8fb8d82

Fixes failing tests

a1060d9

merge

1036241

fix failing tests

4a42ea8

fix failing tests

8c6a959

fix failing tests

522057d

Fix fork subworkflow rerun e2e race

cebefdd

fix failing tests

c55166b

fix failing tests

2350667

v1r3n and others added 9 commits May 2, 2026 01:58

Merge branch 'main' into improvments/dyn-fork-join-sw

05daa58

Merge branch 'main' into improvments/dyn-fork-join-sw

1504782

merge

d951579

merge

c4e7157

fix tests

7281a72

Generate a deterministic subworkflow-id without reservations

8d6745a

Generate a deterministic subworkflow-id without reservations

40f9afd

Generate a deterministic subworkflow-id without reservations

cba55e3

spotless

04763b5

rajeshwar-nu added 3 commits May 6, 2026 15:52

fix tests

5d0c105

fix tests

51c886f

fix tests

6f0f885

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve SUB_WORKFLOW reliability, recovery, and scalability#973

Improve SUB_WORKFLOW reliability, recovery, and scalability#973
rajeshwar-nu wants to merge 29 commits into
conductor-oss:mainfrom
steeleye:improvments/dyn-fork-join-sw

rajeshwar-nu commented Apr 4, 2026 •

edited

Loading

Uh oh!

rajeshwar-nu commented Apr 9, 2026

Uh oh!

v1r3n commented Apr 20, 2026

Uh oh!

rajeshwar-nu commented Apr 21, 2026

Uh oh!

v1r3n commented Apr 21, 2026

Uh oh!

rajeshwar-nu commented Apr 21, 2026 •

edited

Loading

Uh oh!

v1r3n commented Apr 26, 2026

Uh oh!

v1r3n commented May 2, 2026

Uh oh!

rajeshwar-nu commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rajeshwar-nu commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What Changed

Idempotent child launch

Faster and lighter parent attachment

Recovery behavior

Reservation lifecycle management

Faster revisit for active SUB_WORKFLOW tasks

Reliability and Scalability Impact

Backward Compatibility

Tests

Uh oh!

rajeshwar-nu commented Apr 9, 2026

Uh oh!

v1r3n commented Apr 20, 2026

Uh oh!

rajeshwar-nu commented Apr 21, 2026

Uh oh!

v1r3n commented Apr 21, 2026

Uh oh!

rajeshwar-nu commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

v1r3n commented Apr 26, 2026

Uh oh!

v1r3n commented May 2, 2026

Uh oh!

rajeshwar-nu commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rajeshwar-nu commented Apr 4, 2026 •

edited

Loading

rajeshwar-nu commented Apr 21, 2026 •

edited

Loading