Skip to content

Provide details and actionable info about Non Deterministic Errors #1693

Open
@longquanzheng

Description

@longquanzheng

Is your feature request related to a problem? Please describe.
This is one example of the current NDE:

2024-10-22T20:38:57.8502536Z 2024/10/22 20:38:57 ERROR Workflow panic Namespace default TaskQueue Interpreter_DEFAULT WorkerID 11283@fv-az1498-207@ WorkflowType Interpreter WorkflowID locking1729629466434353139 RunID 96cc0bce-c727-45ab-8c58-32a9928ec882 Attempt 1 Error [TMPRL1100] lookup failed for scheduledEventID to activityID: scheduleEventID: 60, activityID: 60 StackTrace process event for Interpreter_DEFAULT [panic]:
2024-10-22T20:38:57.8505930Z go.temporal.io/sdk/internal.panicIllegalState(...)
2024-10-22T20:38:57.8507126Z 	/home/runner/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_command_state_machine.go:518
2024-10-22T20:38:57.8508780Z go.temporal.io/sdk/internal.(*commandsHelper).handleActivityTaskScheduled(0xc000b779a0, {0xc000a3d88c, 0x2}, 0x3c)
2024-10-22T20:38:57.8510591Z 	/home/runner/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_command_state_machine.go:1150 +0xf8
2024-10-22T20:38:57.8512440Z go.temporal.io/sdk/internal.(*workflowExecutionEventHandlerImpl).ProcessEvent(0xc000e13ea8, 0xc0012bdef0, 0x4?, 0x0)
2024-10-22T20:38:57.8514059Z 	/home/runner/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_event_handlers.go:1217 +0x275
2024-10-22T20:38:57.8515606Z go.temporal.io/sdk/internal.(*workflowExecutionContextImpl).ProcessWorkflowTask(0xc001184c80, 0xc00150da40)
2024-10-22T20:38:57.8517165Z 	/home/runner/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_task_handlers.go:1152 +0x199c
2024-10-22T20:38:57.8518726Z go.temporal.io/sdk/internal.(*workflowTaskHandlerImpl).ProcessWorkflowTask(0xc0011a6750, 0xc00150da40, 0xc001184c80, 0xc00111fc50)
2024-10-22T20:38:57.8520250Z 	/home/runner/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_task_handlers.go:917 +0x3bf
2024-10-22T20:38:57.8521674Z go.temporal.io/sdk/internal.(*workflowTaskPoller).processWorkflowTask(0xc000a0e120, 0xc00150da40)
2024-10-22T20:38:57.8522716Z 	/home/runner/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_task_pollers.go:363 +0x3a3
2024-10-22T20:38:57.8523572Z go.temporal.io/sdk/internal.(*workflowTaskPoller).ProcessTask(0xc000a0e120, {0x14d1660, 0xc00150da40})
2024-10-22T20:38:57.8524404Z 	/home/runner/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_task_pollers.go:325 +0x78
2024-10-22T20:38:57.8525101Z go.temporal.io/sdk/internal.(*baseWorker).processTaskAsync.func1()
2024-10-22T20:38:57.8525809Z 	/home/runner/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_worker_base.go:444 +0x12f
2024-10-22T20:38:57.8526802Z created by go.temporal.io/sdk/internal.(*baseWorker).processTaskAsync in goroutine 5428
2024-10-22T20:38:57.8527590Z 	/home/runner/go/pkg/mod/go.temporal.io/[email protected]/internal/internal_worker_base.go:423 +0x90

In this example, my workflow removed a "UpsertSearchAttribute" API by mistake.
But this info is really hard to figure out why it runs into NDE. I would expect the error to say it more clearly: Workflow code tried to scheduled an activity but based on the history events to replay, it should upsert a search attribute instead. That way, we can clearly know that is something wrong with our code with the upsert search attribute (by looking at the event details of UpsertSearchAttribute in the history).

Describe the solution you'd like
As above said.

Describe alternatives you've considered
NA

Additional context
NA

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions