|
| 1 | +--- |
| 2 | +id: serverless-workers |
| 3 | +title: Troubleshoot Serverless Workers |
| 4 | +sidebar_label: Serverless Workers |
| 5 | +description: |
| 6 | + Diagnose and fix issues with Temporal Serverless Workers on AWS Lambda by tracing the invocation flow from Task Queue |
| 7 | + to Worker execution. |
| 8 | +toc_max_heading_level: 4 |
| 9 | +keywords: |
| 10 | + - serverless |
| 11 | + - lambda |
| 12 | + - troubleshooting |
| 13 | + - worker |
| 14 | + - invocation |
| 15 | +tags: |
| 16 | + - Workers |
| 17 | + - Serverless |
| 18 | + - Troubleshooting |
| 19 | + - AWS Lambda |
| 20 | +--- |
| 21 | + |
| 22 | +:::tip SUPPORT, STABILITY, and DEPENDENCY INFO |
| 23 | + |
| 24 | +Serverless Workers are in [Pre-release](/evaluate/development-production-features/release-stages#pre-release). |
| 25 | + |
| 26 | +APIs are experimental and may be subject to backwards-incompatible changes. |
| 27 | + |
| 28 | +::: |
| 29 | + |
| 30 | +import Tabs from '@theme/Tabs'; |
| 31 | +import TabItem from '@theme/TabItem'; |
| 32 | + |
| 33 | +This page walks through the Serverless Worker invocation flow and helps you identify where a failure is occurring. |
| 34 | + |
| 35 | +When a Serverless Worker invocation works correctly, the following sequence happens: |
| 36 | + |
| 37 | +1. You deploy the Worker function on Lambda. |
| 38 | +2. You configure a [Worker Deployment Version](/worker-versioning#worker-deployment-version) with a compute provider. This starts a [Worker Controller Instance (WCI)](/serverless-workers#how-invocation-works) Workflow and a validation invocation of the Lambda function. |
| 39 | +3. The Lambda polls the Temporal Service successfully, binding the [Task Queue](/encyclopedia/task-queues) configured on the Worker to the Worker Deployment Version. |
| 40 | +4. The WCI continuously monitors the associated Task Queue on a schedule. The [Matching Service](/clusters#matching-service) also notifies the WCI Workflow of sync match failures immediately as they happen. |
| 41 | +5. A Task arrives on the Task Queue and the WCI detects the backlog. |
| 42 | +6. The WCI invokes the Lambda function. |
| 43 | +7. The Lambda function starts, the Worker connects to Temporal and polls the Task Queue. |
| 44 | +8. The Worker processes Tasks and shuts down gracefully. |
| 45 | + |
| 46 | +Start by determining whether the Lambda function is being invoked at all, then narrow down from there. |
| 47 | + |
| 48 | +## Is the Lambda function being invoked? {#is-lambda-invoked} |
| 49 | + |
| 50 | +Check the Lambda function's CloudWatch metrics or invocation logs. |
| 51 | + |
| 52 | +In the AWS Console, go to **Lambda > Functions > your function > Monitor**. Look for recent invocations in the |
| 53 | +**Invocations** graph. You can also check **CloudWatch > Log groups > /aws/lambda/your-function-name** for execution |
| 54 | +logs. |
| 55 | + |
| 56 | +If there are no invocations, continue to [Lambda is not being invoked](#lambda-not-invoked). |
| 57 | + |
| 58 | +If the Lambda is being invoked but Workflows are not progressing, skip to |
| 59 | +[Lambda is invoked but Tasks are not completing](#lambda-invoked-not-completing). |
| 60 | + |
| 61 | +## Lambda is not being invoked {#lambda-not-invoked} |
| 62 | + |
| 63 | +Work through the following checks in order. |
| 64 | + |
| 65 | +### Validate the connection to Lambda {#validate-connection} |
| 66 | + |
| 67 | +Start by verifying that Temporal can reach the Lambda function. Go to **Workers > Deployments > select your |
| 68 | +deployment**, open the **Actions** menu on the version, and click **Validate Connection**. A successful validation |
| 69 | +confirms that the Worker Deployment Version has a compute provider configured, that Temporal can assume the invocation |
| 70 | +role, and that the Lambda function can be invoked. |
| 71 | + |
| 72 | +If validation fails, verify that the Lambda function ARN and invocation role ARN in the Worker Deployment Version |
| 73 | +configuration are correct. Verify the invocation role was created using the |
| 74 | +[CloudFormation template](/production-deployment/worker-deployments/serverless-workers/aws-lambda#create-invocation-role) |
| 75 | +and that the External ID matches the value in the Worker Deployment Version configuration. |
| 76 | + |
| 77 | +If the Worker Deployment Version does not have a compute provider configured, no |
| 78 | +[Worker Controller Instance (WCI)](/serverless-workers#how-invocation-works) Workflow exists and the Lambda is never |
| 79 | +automatically invoked. A common cause is manually invoking the Lambda function before creating the Worker Deployment |
| 80 | +Version in the UI or CLI. When the Lambda runs, the Worker connects to Temporal and polls the Task Queue. That polling |
| 81 | +registers the Worker Deployment Version and binds the Task Queue on the server, but the version has no compute provider. |
| 82 | +To fix the issue, create or update the Worker Deployment Version with the compute provider flags as described in the |
| 83 | +[deploy guide](/production-deployment/worker-deployments/serverless-workers/aws-lambda#create-worker-deployment-version). |
| 84 | + |
| 85 | +### Check that the version is set as current {#check-version-current} |
| 86 | + |
| 87 | +The Worker Deployment Version must be set as the current version for new Tasks to route to it. If you created the |
| 88 | +version through the CLI, you need to |
| 89 | +[set it as current](/production-deployment/worker-deployments/serverless-workers/aws-lambda#set-current-version). |
| 90 | + |
| 91 | +You can verify the current version with `temporal worker deployment describe`. |
| 92 | + |
| 93 | +### Check that the WCI is detecting Tasks {#check-wci-detecting-tasks} |
| 94 | + |
| 95 | +If the connection validates successfully but the Lambda is still not being invoked, the |
| 96 | +[Worker Controller Instance (WCI)](/serverless-workers#worker-controller-instance) may not be detecting Tasks on the |
| 97 | +Task Queue. |
| 98 | + |
| 99 | +Check which Task Queues are bound to the Worker Deployment Version and whether there is a backlog: |
| 100 | + |
| 101 | +```bash |
| 102 | +temporal worker deployment describe-version \ |
| 103 | + --namespace <NAMESPACE> \ |
| 104 | + --deployment-name <DEPLOYMENT_NAME> \ |
| 105 | + --build-id <BUILD_ID> \ |
| 106 | + --report-task-queue-stats |
| 107 | +``` |
| 108 | + |
| 109 | +If no Task Queues are listed, the binding has not been established. The server binds a Task Queue to a Worker Deployment |
| 110 | +Version when a Worker with that deployment version successfully connects and polls the Task Queue. |
| 111 | + |
| 112 | +A common cause is a failed first invocation. When you create a Worker Deployment Version, the WCI invokes the Lambda to |
| 113 | +validate the configuration. If that first invocation fails (for example, due to missing environment variables, incorrect |
| 114 | +TLS configuration, or missing dependencies), the Worker never connects to Temporal and never polls. Without a successful |
| 115 | +poll, the Task Queue binding is never created. |
| 116 | + |
| 117 | +To diagnose a failed first invocation, invoke the Lambda function manually from the AWS Console. The console displays |
| 118 | +the execution result and any errors directly, making it easier to identify configuration issues than searching through |
| 119 | +CloudWatch logs. Once the Lambda runs successfully and the Worker connects to Temporal, the Task Queue binding is |
| 120 | +established. |
| 121 | + |
| 122 | +## Lambda is invoked but Tasks are not completing {#lambda-invoked-not-completing} |
| 123 | + |
| 124 | +If CloudWatch shows Lambda invocations but Workflows are not progressing, the problem is in the Worker's execution |
| 125 | +within the Lambda function. |
| 126 | + |
| 127 | +### Check Lambda execution logs {#check-execution-logs} |
| 128 | + |
| 129 | +Check CloudWatch logs for errors during Worker startup. In the AWS Console, go to **CloudWatch > Log groups > |
| 130 | +/aws/lambda/your-function-name** and look for recent error messages. |
| 131 | + |
| 132 | +Common errors include: |
| 133 | + |
| 134 | +- **Connection failures**: The Worker cannot reach the Temporal Service. Check that the `TEMPORAL_ADDRESS` and |
| 135 | + `TEMPORAL_API_KEY` environment variables (or `temporal.toml` config file) are correctly set on the Lambda function. |
| 136 | + For self-hosted deployments, verify |
| 137 | + [network reachability](/production-deployment/worker-deployments/serverless-workers/self-hosted-setup#ensure-reachability). |
| 138 | +- **TLS errors**: The TLS certificate or key is missing, expired, or does not match the Namespace. |
| 139 | +- **Authentication errors**: The API key is invalid or does not have access to the Namespace. |
| 140 | + |
| 141 | +### Check for Lambda timeout {#check-lambda-timeout} |
| 142 | + |
| 143 | +If the Lambda function reaches its configured timeout before the Worker finishes processing, AWS terminates the |
| 144 | +invocation. |
| 145 | + |
| 146 | +The Worker begins graceful shutdown before the Lambda deadline. If Activities take longer than the available execution |
| 147 | +window, the Activities are abandoned mid-execution and retried on the next invocation. |
| 148 | + |
| 149 | +For long-running Activities, increase the Lambda timeout and the Worker's shutdown buffer together. See |
| 150 | +[Tuning for long-running Activities](/serverless-workers#tuning-for-long-running-activities) for guidance on how these |
| 151 | +values relate. |
| 152 | + |
| 153 | +### Check that the deployment name and build ID match {#check-deployment-match} |
| 154 | + |
| 155 | +If CloudWatch shows rapid, repeated invocations with no Workflow progress, the deployment name or build ID in the Worker |
| 156 | +code may not match the Worker Deployment Version configuration. |
| 157 | + |
| 158 | +The deployment name and build ID in your Lambda function code must exactly match the values you used when creating the |
| 159 | +Worker Deployment Version. Compare the values in your code against the WCI Workflow ID |
| 160 | +(`temporal-sys-worker-controller-instance:<deployment-name>:<build-id>`) and the output of |
| 161 | +`temporal worker deployment describe`. |
| 162 | + |
| 163 | +A mismatch causes an invocation loop: the WCI invokes the Lambda, the Worker starts and polls with a different |
| 164 | +deployment version than the WCI expects, the Task is not processed, and the WCI invokes the Lambda again. |
| 165 | + |
| 166 | +To fix the loop, update the deployment name and build ID in the Worker code to match the Worker Deployment Version, then |
| 167 | +redeploy the Lambda function. |
0 commit comments