Skip to content

Signal handling in python client #94

@timl3136

Description

@timl3136

Signal Handling in the Cadence Python Client: Deterministic Communication with Workflows

Signal handling in the Cadence Python client enables workflows to receive asynchronous messages from the outside world without interrupting their execution. Unlike traditional RPC-style communication, signals are designed to arrive at any point during a workflow's run and be processed deterministically through the same event loop that drives the entire workflow. In this post, we'll explore how signals work, why determinism matters, and how to use them in your workflows.

What Are Signals and Why Do We Need Them?

Workflows often need to respond to external events that arrive unpredictably. Consider a document approval workflow: it runs for days or weeks, fetching data, running approval checks, and coordinating with other systems. At any moment, a human manager might click "Approve" on a dashboard, and that signal needs to reach the workflow immediately.

Signals are how the outside world communicates with a running Cadence workflow without interrupting it. Common use cases include:

  • Human-in-the-loop approval: Workflows that wait for external approval before proceeding
  • Configuration broadcasting: Sending configuration changes to running workflows in real-time
  • Multi-stage pipeline coordination: One workflow signaling another to proceed to the next stage
  • Periodic liveness checks: External systems monitoring the health of long-running workflows

Signals are fundamentally different from activities: they carry no return value, they must not fail the workflow on delivery, and they arrive asynchronously relative to the workflow's own execution. The workflow code must be able to receive a signal at any point during its run, regardless of what it is currently doing.

How Signals Work: The Design

Defining Signal Handlers

Signal handlers are defined by decorating workflow methods with @workflow.signal(name=...). During the workflow-definition setup phase, the SDK performs a scan of the workflow class to collect these decorated methods, establishing a mapping from each signal name to its corresponding handler definition.

At runtime, when the workflow engine receives a signal event from the decision-task history, it extracts the signal name to perform a lookup in the registered mapping. Upon matching a registered handler, the signal payload is decoded according to the handler's signature, and the task is scheduled on the deterministic event loop. In instances where no handler is registered for a received name, the signal is logged and subsequently dropped.

Event Routing and Dispatch

Signal handling in this SDK is built on three parts: event routing that reads signal events from decision-task history, dispatch that schedules handlers on the deterministic event loop, and predicate waiting that lets workflow code pause until a condition becomes true.

When the Cadence server delivers a decision task, the workflow engine iterates the task's history. When a signal event is received, the workflow engine does not invoke the handler immediately while reading history. Signal events are read in history order, and signal handling work is queued and processed in that same FIFO order. During replay, the same history produces the same queued work, so replayed signals are handled in the same order as live signals.

This is the key to determinism: we don't execute signals as they arrive in real-time. Instead, we queue them in the order they were originally processed and replay them deterministically. When your workflow replays its history (which happens every time the workflow resumes after an interruption), it sees the exact same signals in the exact same order, producing the exact same results.

Sync vs. Async Signal Handlers

In terms of async vs sync signal handlers, they follow the same dispatch path through the event loop. The difference is after each handler's execution: a sync signal will run to completion. On the other hand, an async handler is scheduled as a task on that same deterministic event loop, it will pause at an await point and resume later in a subsequent loop turn. If another signal arrives while that async handler is suspended, it is queued and processed by the same event loop in deterministic order rather than being allowed to race arbitrarily ahead. In both cases, signal handling is still driven by the deterministic event loop.

The Wait Condition Primitive

A workflow often needs to pause until certain conditions are met. Rather than writing polling loops or busy-waiting logic, you can use the wait_condition primitive to cleanly express "pause here until this condition becomes true."

class ApprovalWorkflow:
    def __init__(self):
        self.approvals = 0
        self.required = 2

    @workflow.signal(name="approve")
    def approve(self) -> None:
        self.approvals += 1

    @workflow.run
    async def run(self) -> str:
        # Suspend until at least two approvals have arrived.
        await workflow.wait_condition(lambda: self.approvals >= self.required)
        return "fully approved"

For this example, the workflow will wait indefinitely until it receives two signal events for approval.

How wait_condition Works Under the Hood

At a high level, wait_condition works like this:

when workflow calls wait_condition(predicate):
    create waiter for predicate
    if predicate is already true:
        continue immediately
    else:
        pause workflow

on each event loop iteration:
    run queued callback
    re-check each waiter predicate
    if a predicate becomes true:
        resolve that waiter
        queue the blocked workflow to continue

This design means a signal handler can finish updating workflow state before the worker checks whether blocked condition (if any) is resolved. It allows workflow to resume immediately after a signal resolves the blocked condition without needing to wait for another event loop iteration.

Understanding the Architecture

Here's a visual representation of how signals flow through the system:

sequenceDiagram
    participant Server as Cadence Server
    participant Engine as WorkflowEngine
    participant EventLoop as DeterministicEventLoop
    participant Workflow as Workflow Instance
    participant SyncHandler as Sync Signal Handler
    participant AsyncHandler as Async Signal Handler

    rect rgb(220, 235, 255)
        Note over Workflow,EventLoop: Workflow is paused, waiting for a condition
    end

    rect rgb(220, 255, 220)
        Note over Server,Workflow: Signal arrives from server
        Server->>Engine: PollForDecisionTask (signal event in history)
        Engine->>EventLoop: queue signal handling work
        Engine->>EventLoop: run_until_yield()
        activate EventLoop
        alt Sync signal handler
            EventLoop->>SyncHandler: schedule and run handler
            activate SyncHandler
            SyncHandler-->>EventLoop: runs to completion and updates state
            deactivate SyncHandler
        else Async signal handler
            EventLoop->>AsyncHandler: schedule and run handler
            activate AsyncHandler
            AsyncHandler-->>EventLoop: may yield before finishing state update
            deactivate AsyncHandler
        end
        EventLoop->>Workflow: re-check waiting condition
        Note over Workflow: Condition is now true
        EventLoop->>Workflow: resume workflow
        Note over Workflow: Workflow continues with updated state
        deactivate EventLoop
    end
Loading

The diagram shows how a signal arrives from the Cadence server, gets queued, and is processed on the deterministic event loop. The flow illustrates both synchronous and asynchronous signal handlers running on the event loop, and how the waiting condition is re-checked once the handler completes. When the condition becomes true, the paused workflow is resumed.

A Practical Example: Human-in-the-Loop Approval

Let's build a more complete example that demonstrates signals in action:

class DocumentApprovalWorkflow:
    def __init__(self):
        self.approved = False
        self.rejected = False
        self.rejection_reason = ""

    @workflow.signal(name="approve")
    def approve(self) -> None:
        self.approved = True

    @workflow.signal(name="reject")
    def reject(self, reason: str) -> None:
        self.rejected = True
        self.rejection_reason = reason

    @workflow.run
    async def run(self, document_id: str) -> str:
        # Load the document
        document = await workflow.execute_activity(
            load_document, 
            document_id
        )
        
        # Wait until one of the two conditions is true
        await workflow.wait_condition(
            lambda: self.approved or self.rejected
        )
        
        if self.approved:
            await workflow.execute_activity(
                publish_document,
                document
            )
            return "Document approved and published"
        else:
            await workflow.execute_activity(
                archive_document,
                document,
                self.rejection_reason
            )
            return f"Document rejected: {self.rejection_reason}"

This workflow loads a document, then waits for external approval or rejection. Once a signal arrives, it transitions to the appropriate next step. The beauty of this approach: the workflow code reads linearly and naturally. There's no event-driven callback spaghetti, no message routing tables, no complex state machines.

Why Determinism Matters

You might wonder: why does the SDK queue signals instead of invoking handlers immediately? The answer is determinism, and it's critical to how Cadence workflows work.

Every workflow can be interrupted at any moment—the process crashes, the network breaks, the data center fails. When the workflow resumes, Cadence replays the entire history to get back to where it was. During replay, all decisions must produce the same results as they did originally. This is how Cadence guarantees that your workflow logic is fault-tolerant: if decision N produced result X before, it must produce X again during replay, every single time.

If signals were processed in real-time order as they arrived, replay would be non-deterministic. A signal arriving at a slightly different moment during replay could change the flow. By queuing signals in the order they arrived (as recorded in the history), we ensure that replay produces identical behavior. The workflow sees the exact same signals in the exact same order, triggering the same state changes, every time.

Key Takeaways

Deterministic execution: Signal order is guaranteed during replay
Non-blocking delivery: Signals never fail workflows
Clean API: Simple decorator-based registration
Flexible pausing: wait_condition eliminates polling loops
Async-first: Native support for both sync and async handlers
Production-ready: Battle-tested in production Cadence deployments

Signals solve a real problem: how to let the outside world communicate with long-running workflows without breaking determinism. The implementation in the Cadence Python client brings this capability to Python developers, enabling patterns like human-in-the-loop workflows, dynamic configuration updates, and cross-workflow coordination—all while maintaining the deterministic guarantees that make Cadence workflows so reliable.

Get Started

To use signals in your workflows, check out the cadence-workflow/cadence-python-client repository and start building deterministic, signal-driven workflows today.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions