Skip to content

Next task is not scheduled and work stall forever #1132

@and-lg

Description

@and-lg

Conductor Version

3.21.23

Brief Description

We are observing an issue where subworkflows created via FORK_JOIN_DYNAMIC can stall indefinitely after completing an HTTP task. The HTTP task completes successfully (status 200 with expected payload), but the next task is never scheduled. As a result, the subworkflow remains in a RUNNING state indefinitely, and the parent workflow is also blocked. A manual intervention (pausing and restarting the subworkflow from the Conductor UI) allows the workflow to resume and complete normally.

Image

Definition of the Subworkflow Stall

{
  "createTime": 0,
  "updateTime": 1778782683427,
  "name": "postAdhocTest_On_Input_OR",
  "description": "workflow to fire adhoc test on single Optical Route.",
  "version": 2,
  "tasks": [
    {
      "name": "check_adhoc_test_type",
      "taskReferenceName": "check_adhoc_test_type_ref",
      "inputParameters": {
        "adhoc_test_type": "${workflow.input.AdhocTestType}"
      },
      "type": "SWITCH",
      "decisionCases": {
        "Case2": [
          {
            "name": "httpApiCall",
            "taskReferenceName": "postAdhoc_case2_ref",
            "inputParameters": {
              "http_request": {
                "connectionTimeOut": 90000,
                "readTimeOut": 90000,
                "contentType": "application/json",
                "uri": "so internal uri",
                "headers": {
                  "x-Token-Roles": "${workflow.input.UserRoles}",
                  "x-Token-Username": "${workflow.input.UserName}",
                  "x-Token-Fms-Scope": "${workflow.input.UserScope}"
                },
                "body": "${workflow.input.Payload.payLoad}",
                "method": "POST"
              }
            },
            "type": "HTTP",
            "decisionCases": {},
            "defaultCase": [],
            "forkTasks": [],
            "startDelay": 0,
            "joinOn": [],
            "optional": true,
            "defaultExclusiveJoinTask": [],
            "asyncComplete": false,
            "loopOver": [],
            "onStateChange": {},
            "permissive": false
          },
          {
            "name": "set_PostAdhoc_case2_output",
            "taskReferenceName": "set_PostAdhoc_case2_output_ref",
            "inputParameters": {
              "postAdhoc_output_status": "${postAdhoc_case2_ref.status}",
              "postAdhoc_output_promiseId": "${postAdhoc_case2_ref.output.response.body}"
            },
            "type": "SET_VARIABLE",
            "decisionCases": {},
            "defaultCase": [],
            "forkTasks": [],
            "startDelay": 0,
            "joinOn": [],
            "optional": false,
            "defaultExclusiveJoinTask": [],
            "asyncComplete": false,
            "loopOver": [],
            "onStateChange": {},
            "permissive": false
          }
        ],
        "Case1": [
          {
            "name": "httpApiCall",
            "taskReferenceName": "get_testConfig_ref",
            "inputParameters": {
              "http_request": {
                "connectionTimeOut": 90000,
                "readTimeOut": 90000,
                "contentType": "application/json",
                "uri": "some internal uri",
                "headers": {
                  "x-Token-Roles": "${workflow.input.UserRoles}",
                  "x-Token-Username": "${workflow.input.UserName}",
                  "x-Token-Fms-Scope": "${workflow.input.UserScope}"
                },
                "method": "GET"
              }
            },
            "type": "HTTP",
            "decisionCases": {},
            "defaultCase": [],
            "forkTasks": [],
            "startDelay": 0,
            "joinOn": [],
            "optional": true,
            "defaultExclusiveJoinTask": [],
            "asyncComplete": false,
            "loopOver": [],
            "onStateChange": {},
            "permissive": false
          },
          {
            "name": "changetestconfig_response",
            "taskReferenceName": "changetestconfig_response_ref",
            "inputParameters": {
              "evaluatorType": "javascript",
              "expression": "function e() { var testConfigPayload = ${get_testConfig_ref.output.response.body.payLoad}; testConfigPayload.WavelengthsUsed = ${workflow.input.WavelengthsUsed}; testConfigPayload.MeasurementType = \"${workflow.input.MeasurementType}\"; return JSON.stringify(testConfigPayload); } e();"
            },
            "type": "INLINE",
            "decisionCases": {},
            "defaultCase": [],
            "forkTasks": [],
            "startDelay": 0,
            "joinOn": [],
            "optional": false,
            "defaultExclusiveJoinTask": [],
            "asyncComplete": false,
            "loopOver": [],
            "onStateChange": {},
            "permissive": false
          },
          {
            "name": "httpApiCall",
            "taskReferenceName": "postAdhoc_case1_ref",
            "inputParameters": {
              "http_request": {
                "connectionTimeOut": 90000,
                "readTimeOut": 90000,
                "contentType": "application/json",
                "uri": "some internal uri",
                "headers": {
                  "x-Token-Roles": "${workflow.input.UserRoles}",
                  "x-Token-Username": "${workflow.input.UserName}",
                  "x-Token-Fms-Scope": "${workflow.input.UserScope}"
                },
                "body": {
                  "name": "${workflow.input.TestConfigName}",
                  "payload": "${changetestconfig_response_ref.output.result}"
                },
                "method": "POST"
              }
            },
            "type": "HTTP",
            "decisionCases": {},
            "defaultCase": [],
            "forkTasks": [],
            "startDelay": 0,
            "joinOn": [],
            "optional": true,
            "defaultExclusiveJoinTask": [],
            "asyncComplete": false,
            "loopOver": [],
            "onStateChange": {},
            "permissive": false
          },
          {
            "name": "set_PostAdhoc_case1_output",
            "taskReferenceName": "set_PostAdhoc_case1_output_ref",
            "inputParameters": {
              "postAdhoc_output_status": "${postAdhoc_case1_ref.status}",
              "postAdhoc_output_promiseId": "${postAdhoc_case1_ref.output.response.body}"
            },
            "type": "SET_VARIABLE",
            "decisionCases": {},
            "defaultCase": [],
            "forkTasks": [],
            "startDelay": 0,
            "joinOn": [],
            "optional": false,
            "defaultExclusiveJoinTask": [],
            "asyncComplete": false,
            "loopOver": [],
            "onStateChange": {},
            "permissive": false
          }
        ]
      },
      "defaultCase": [
        {
          "name": "adhoc_invalid_test_type_response",
          "taskReferenceName": "adhoc_invalid_test_type_response_ref",
          "inputParameters": {
            "evaluatorType": "javascript",
            "expression": "function e() { return {\"resultId\":\"\",\"opticalRouteName\":\"${workflow.input.OpticalRouteName}\",\"opticalRouteId\":${workflow.input.OpticalRouteId},\"portId\":\"\",\"portNumber\":\"\",\"testTime\":\"\",\"status\":\"FAILED\",\"linkLength\":\"\",\"linkLoss\":\"\",\"wavelength\":\"\",\"globalStarRating\":\"\",\"orlStarRating\":\"\",\"lossStarRating\":\"\",\"globalVerdict\":\"\",\"globalDeviationVerdict\":\"\",\"completionStatus\":\"\",\"message\":\"ADHOC_CALL_FAILED\" }} e();"
          },
          "type": "INLINE",
          "decisionCases": {},
          "defaultCase": [],
          "forkTasks": [],
          "startDelay": 0,
          "joinOn": [],
          "optional": false,
          "defaultExclusiveJoinTask": [],
          "asyncComplete": false,
          "loopOver": [],
          "onStateChange": {},
          "permissive": false
        },
        {
          "name": "error_check_postadhoc_type",
          "taskReferenceName": "error_check_postadhoc_type_ref",
          "inputParameters": {
            "terminationStatus": "COMPLETED",
            "workflowOutput": "${adhoc_invalid_test_type_response_ref.output}"
          },
          "type": "TERMINATE",
          "decisionCases": {},
          "defaultCase": [],
          "forkTasks": [],
          "startDelay": 0,
          "joinOn": [],
          "optional": false,
          "defaultExclusiveJoinTask": [],
          "asyncComplete": false,
          "loopOver": [],
          "onStateChange": {},
          "permissive": false
        }
      ],
      "forkTasks": [],
      "startDelay": 0,
      "joinOn": [],
      "optional": false,
      "defaultExclusiveJoinTask": [],
      "asyncComplete": false,
      "loopOver": [],
      "evaluatorType": "value-param",
      "expression": "adhoc_test_type",
      "onStateChange": {},
      "permissive": false
    },
    {
      "name": "readPostAdhoc_output",
      "taskReferenceName": "readPostAdhoc_output_ref",
      "inputParameters": {
        "value": "${workflow.variables.postAdhoc_output_status}",
        "evaluatorType": "javascript",
        "expression": "function e() { return $.value } e();"
      },
      "type": "INLINE",
      "decisionCases": {},
      "defaultCase": [],
      "forkTasks": [],
      "startDelay": 0,
      "joinOn": [],
      "optional": false,
      "defaultExclusiveJoinTask": [],
      "asyncComplete": false,
      "loopOver": [],
      "onStateChange": {},
      "permissive": false
    },
    {
      "name": "check_error_post_adhoc",
      "taskReferenceName": "check_error_post_adhoc_ref",
      "inputParameters": {
        "case_value_param": "${workflow.variables.postAdhoc_output_status}"
      },
      "type": "SWITCH",
      "decisionCases": {
        "error": [
          {
            "name": "adhoc_error_async_response",
            "taskReferenceName": "adhoc_error_async_response_ref",
            "inputParameters": {
              "evaluatorType": "javascript",
              "expression": "function e() { return {\"resultId\":\"\",\"opticalRouteName\":\"${workflow.input.OpticalRouteName}\",\"opticalRouteId\":${workflow.input.OpticalRouteId},\"portId\":\"\",\"portNumber\":\"\",\"testTime\":\"\",\"status\":\"FAILED\",\"linkLength\":\"\",\"linkLoss\":\"\",\"wavelength\":\"\",\"globalStarRating\":\"\",\"orlStarRating\":\"\",\"lossStarRating\":\"\",\"globalVerdict\":\"\",\"globalDeviationVerdict\":\"\",\"completionStatus\":\"\",\"message\":\"ADHOC_CALL_FAILED\" }} e();"
            },
            "type": "INLINE",
            "decisionCases": {},
            "defaultCase": [],
            "forkTasks": [],
            "startDelay": 0,
            "joinOn": [],
            "optional": false,
            "defaultExclusiveJoinTask": [],
            "asyncComplete": false,
            "loopOver": [],
            "onStateChange": {},
            "permissive": false
          },
          {
            "name": "error_sync_response",
            "taskReferenceName": "error_sync_response",
            "inputParameters": {
              "terminationStatus": "COMPLETED",
              "workflowOutput": "${adhoc_error_async_response_ref.output}"
            },
            "type": "TERMINATE",
            "decisionCases": {},
            "defaultCase": [],
            "forkTasks": [],
            "startDelay": 0,
            "joinOn": [],
            "optional": false,
            "defaultExclusiveJoinTask": [],
            "asyncComplete": false,
            "loopOver": [],
            "onStateChange": {},
            "permissive": false
          }
        ]
      },
      "defaultCase": [
        {
          "name": "listener",
          "taskReferenceName": "listener",
          "inputParameters": {
            "http_request": {
              "connectionTimeOut": 90000,
              "readTimeOut": 90000,
              "uri": "some internal api",
              "method": "PUT",
              "body": {
                "timeout": 240000000,
                "url": "some internal url",
                "method": "POST",
                "callbackBodyTemplateOnReceived": "{\"workflowInstanceId\": \"${workflow.workflowId}\", \"taskId\": \"${CPEWF_TASK_ID}\", \"status\": \"COMPLETED\", \"outputData\": {\"rtuResponse\": {\"headers\": {headers}, \"body\": {body}}}}",
                "callbackBodyTemplateOnTimeout": "{\"workflowInstanceId\": \"${workflow.workflowId}\", \"taskId\": \"${CPEWF_TASK_ID}\", \"status\": \"FAILED\",  \"outputData\": {\"resultId\":\"\",\"opticalRouteName\":\"${workflow.input.OpticalRouteName}\",\"opticalRouteId\":${workflow.input.OpticalRouteId},\"portId\":\"\",\"portNumber\":\"\",\"testTime\":\"\",\"status\":\"FAILED\",\"linkLength\":\"\",\"linkLoss\":\"\",\"wavelength\":\"\",\"globalStarRating\":\"\",\"orlStarRating\":\"\",\"lossStarRating\":\"\",\"globalVerdict\":\"\",\"globalDeviationVerdict\":\"\",\"completionStatus\":\"\",\"message\":\"ADHOC_CALL_FAILED\"}}"
              }
            }
          },
          "type": "HTTP",
          "decisionCases": {},
          "defaultCase": [],
          "forkTasks": [],
          "startDelay": 0,
          "joinOn": [],
          "optional": true,
          "defaultExclusiveJoinTask": [],
          "asyncComplete": true,
          "loopOver": [],
          "onStateChange": {},
          "permissive": false
        }
      ],
      "forkTasks": [],
      "startDelay": 0,
      "joinOn": [],
      "optional": false,
      "defaultExclusiveJoinTask": [],
      "asyncComplete": false,
      "loopOver": [],
      "evaluatorType": "javascript",
      "expression": "(function() { if ($.case_value_param != 'COMPLETED') return 'error' })() ",
      "onStateChange": {},
      "permissive": false
    },
    {
      "name": "check_error_async_response",
      "taskReferenceName": "check_error_async_response",
      "inputParameters": {
        "case_value_param": "${listener.status}",
        "result_type_param": "${listener.output.rtuResponse.headers.FgResultType}"
      },
      "type": "SWITCH",
      "decisionCases": {
        "error": [
          {
            "name": "adhoc_listener_error_async_response",
            "taskReferenceName": "adhoc_listener_error_async_response_ref",
            "inputParameters": {
              "evaluatorType": "javascript",
              "expression": "function e() { return {\"resultId\":\"\",\"opticalRouteName\":\"${workflow.input.OpticalRouteName}\",\"opticalRouteId\":${workflow.input.OpticalRouteId},\"portId\":\"\",\"portNumber\":\"\",\"testTime\":\"\",\"status\":\"FAILED\",\"linkLength\":\"\",\"linkLoss\":\"\",\"wavelength\":\"\",\"globalStarRating\":\"\",\"orlStarRating\":\"\",\"lossStarRating\":\"\",\"globalVerdict\":\"\",\"globalDeviationVerdict\":\"\",\"completionStatus\":\"\",\"message\":\"ADHOC_CALL_FAILED\" }} e();"
            },
            "type": "INLINE",
            "decisionCases": {},
            "defaultCase": [],
            "forkTasks": [],
            "startDelay": 0,
            "joinOn": [],
            "optional": false,
            "defaultExclusiveJoinTask": [],
            "asyncComplete": false,
            "loopOver": [],
            "onStateChange": {},
            "permissive": false
          },
          {
            "name": "error_async_response",
            "taskReferenceName": "error_async_response",
            "inputParameters": {
              "terminationStatus": "COMPLETED",
              "workflowOutput": "${adhoc_listener_error_async_response_ref.output}"
            },
            "type": "TERMINATE",
            "decisionCases": {},
            "defaultCase": [],
            "forkTasks": [],
            "startDelay": 0,
            "joinOn": [],
            "optional": false,
            "defaultExclusiveJoinTask": [],
            "asyncComplete": false,
            "loopOver": [],
            "onStateChange": {},
            "permissive": false
          }
        ]
      },
      "defaultCase": [
        {
          "name": "extractResultId",
          "taskReferenceName": "extractResultId_ref",
          "inputParameters": {
            "value": "${listener.output.rtuResponse.body}",
            "evaluatorType": "javascript",
            "expression": "(function(){var parsed=JSON.parse($.value);var linkResults=parsed.brief.LinkResults.Results;return{resultId:parsed.resultid,opticalRouteName:parsed.metadata.AssetName,opticalRouteId:parsed.metadata.AssetId,portId:parsed.metadata.PortId,portNumber:parsed.metadata.PortId,testTime:parsed.metadata.TestTime,status:\"COMPLETED\",linkLength:parsed.brief.LinkResults.Length,linkLoss:linkResults[0].Loss,wavelength:linkResults[0].Wavelength,globalStarRating:parsed.brief.GlobalStarRating,orlStarRating:parsed.brief.LinkResults.OrlStarRating,lossStarRating:parsed.brief.LinkResults.LossStarRating,linkResults:linkResults.map(function(result){return{linkLoss:result.Loss,wavelength:result.Wavelength}}),globalVerdict:parsed.brief.GlobalVerdict,globalDeviationVerdict:parsed.brief.Measurement.GlobalDeviationVerdict,completionStatus:parsed.brief.LinkResults.CompletionStatus,message:\"\"}})();"
          },
          "type": "INLINE",
          "decisionCases": {},
          "defaultCase": [],
          "forkTasks": [],
          "startDelay": 0,
          "joinOn": [],
          "optional": false,
          "defaultExclusiveJoinTask": [],
          "asyncComplete": false,
          "loopOver": [],
          "onStateChange": {},
          "permissive": false
        },
        {
          "name": "terminate_with_success",
          "taskReferenceName": "terminate_with_success",
          "inputParameters": {
            "terminationStatus": "COMPLETED",
            "workflowOutput": "${extractResultId_ref.output}"
          },
          "type": "TERMINATE",
          "decisionCases": {},
          "defaultCase": [],
          "forkTasks": [],
          "startDelay": 0,
          "joinOn": [],
          "optional": false,
          "defaultExclusiveJoinTask": [],
          "asyncComplete": false,
          "loopOver": [],
          "onStateChange": {},
          "permissive": false
        }
      ],
      "forkTasks": [],
      "startDelay": 0,
      "joinOn": [],
      "optional": false,
      "defaultExclusiveJoinTask": [],
      "asyncComplete": false,
      "loopOver": [],
      "evaluatorType": "javascript",
      "expression": "(function() { if ($.case_value_param != 'COMPLETED' || $.result_type_param.toUpperCase() == \"ERRORJSON\") return 'error' })()",
      "onStateChange": {},
      "permissive": false
    }
  ],
  "inputParameters": [],
  "outputParameters": {},
  "schemaVersion": 2,
  "restartable": true,
  "workflowStatusListenerEnabled": false,
  "ownerEmail": "exfo@exfo.com",
  "timeoutPolicy": "ALERT_ONLY",
  "timeoutSeconds": 0,
  "variables": {},
  "inputTemplate": {},
  "enforceSchema": true,
  "metadata": {},
  "maskedFields": []
}

Detail Description

The issue started appearing after upgrading Conductor from 3.19.0 to 3.21.23, and is observed after any HTTP task within a subworkflow.

In the HTTP task postAdhoc_case1_ref, we are having isssue on our api, which makes response payload returned as a UUID in text/plain format. However, Conductor attempts to parse the response as application/json, which results in a large number of JsonParseException entries in the logs. While this behavior generates significant noise in the logs, we are not certain whether it is directly causing the workflow to stall, as most workflows still complete successfully and ignore these parsing errors.

From the logs and Conductor UI:

  • The HTTP task completes normally with a valid response
  • There are no visible errors at the task level
  • However, the workflow does not progress beyond this point

This behavior is intermittent and not consistently reproducible.

Workaround Found

When the stalled subworkflow is manually paused and resumed via the Conductor UI, the subworkflow will be able to schedule the next task normally and the workflow resumes and completes successfully.

Although we currently have a workaround, we would still like to better understand the root cause and identify a proper solution. Ideally, tasks should complete naturally without requiring manual intervention (such as pausing and resuming them through the Conductor UI).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions