troubleshooting: serverless workers#4521
troubleshooting: serverless workers#4521lennessyy merged 8 commits intofeat/serverless-worker-prereleasefrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📖 Docs PR preview links
|
| - The invocation role ARN is correct. | ||
| - The trust policy on the invocation role allows the Temporal Cloud account to assume the role. | ||
| - The External ID in the trust policy matches the External ID in the Worker Deployment Version configuration. | ||
| - The invocation role has `lambda:InvokeFunction` permission for the Lambda function ARN. |
There was a problem hiding this comment.
| - The invocation role has `lambda:InvokeFunction` permission for the Lambda function ARN. | |
| - The invocation role has `lambda:InvokeFunction` as well as `lambda:GetFunction` permissions for the Lambda function ARN. |
|
|
||
| - The Lambda function ARN in the Worker Deployment Version configuration points to an existing function. | ||
| - The invocation role ARN is correct. | ||
| - The trust policy on the invocation role allows the Temporal Cloud account to assume the role. |
There was a problem hiding this comment.
I wonder whether we can say something more specific here - e.g. ask them to compare with the template or the like?
There was a problem hiding this comment.
I am just going to ask that the make sure to create it with the template. If they created it with the template there shouldn't be any of these issues.
| You can view the WCI Workflow by listing system Workflows in your Namespace: | ||
|
|
||
| ```bash | ||
| temporal workflow list \ | ||
| --namespace <NAMESPACE> \ | ||
| --query 'TemporalNamespaceDivision = "TemporalWorkerControllerInstance"' | ||
| ``` | ||
|
|
||
| WCI Workflow IDs follow the pattern `temporal-sys-worker-controller-instance:<deployment-name>:<build-id>`. You can | ||
| inspect the WCI's history to see its recent `PullStats` Activity results: | ||
|
|
||
| ```bash | ||
| temporal workflow show \ | ||
| --namespace <NAMESPACE> \ | ||
| --workflow-id 'temporal-sys-worker-controller-instance:<DEPLOYMENT_NAME>:<BUILD_ID>' | ||
| ``` | ||
|
|
There was a problem hiding this comment.
is this really the first thing people should do? I would have thought debugging the task queue registration would be first...
There was a problem hiding this comment.
Hmm, yeah that's a good point. I'll just move the WCI stuff to another page. I think it's a neat thing that people should know how to view, but yeah the task queue registration is more important here
|
|
||
| To diagnose a failed first invocation, check the Lambda function's CloudWatch logs for errors from the initial | ||
| invocation. Fix the Lambda configuration, then update the Worker Deployment Version to trigger a new validation | ||
| invocation. |
There was a problem hiding this comment.
I would have the customer re-run the lambda in the AWS console for simplicity - there they can see whether it succeeds and what errors it encounters
* docs: Serverless Workers - Go SDK pages (2/4) (#4418) * docs: Serverless Workers - Go SDK pages (2/4) Add Go SDK Serverless Workers documentation including the lambdaworker package guide for AWS Lambda, rewrite run-worker-process to focus on long-lived Workers, and remove cloud-worker (content folded in). Update sidebars, add redirect, and set broken links to warn for cross-PR references. Part of #4405. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: expand OTel section with context and links for serverless Go SDK page Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: link worker defaults to Go SDK reference, clarify ShutdownDeadlineBuffer is serverless-only Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: address review feedback on Go SDK pages - Replace ambiguous "Lambda deadline" with "configurable invocation deadline" on the AWS Lambda page (akhayam) - Rewrite tautological "serverless compute" intro on the Go SDK serverless landing page (smuneebahmad) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: Serverless Workers - Deploy guide (3/4) (#4416) * docs: add Serverless Workers production deployment guide (3/4) Add deploy guide for serverless workers covering AWS Lambda deployment, including the serverless-workers index and aws-lambda pages under production-deployment/worker-deployments. Update sidebar navigation and set onBrokenLinks/onBrokenAnchors to warn for cross-PR references. Part of #4405. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: address review feedback on Deploy guide - Rename TLS env vars to TEMPORAL_TLS_CLIENT_CERT_PATH and TEMPORAL_TLS_CLIENT_KEY_PATH (smuneebahmad) - Add HOME=/tmp env var to deploy command and env var table; needed for the SDK's config loader to resolve a user config directory in Lambda (smuneebahmad) - Include TEMPORAL_API_KEY in deploy command so the auth path is complete (smuneebahmad) - Remove --scaler-min-instances, --scaler-max-instances from create-version snippet (smuneebahmad) - Remove --ignore-missing-task-queues from set-current-version snippet (smuneebahmad) - Replace CloudFormation template stub with the real template from smuneebahmad, inline in a <details> block plus a downloadable file at /files/temporal-cloud-serverless-worker-role.yaml - Update IAM parameter table to match real template params (AssumeRoleExternalId, LambdaFunctionARNs, RoleName) - Add note flagging template is Cloud-scoped; self-hosted TBD - Add sample aws cloudformation create-stack usage - Use "invocation deadline" for consistency with Go SDK page Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: Serverless Workers - Evaluate pages (1/4) (#4417) * docs: Serverless Workers - Evaluate pages (1/4) Add the Evaluate section for Serverless Workers documentation: - Serverless Workers overview page - Interactive demo page with ServerlessWorkerDemo component - Sidebar entry under Features - Redirect from old demo URL - Change onBrokenLinks/onBrokenAnchors to 'warn' for incremental PRs Part 1 of 4, splitting PR #4405 into smaller PRs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: address PM feedback on evaluate page - Fix polling description: serverless workers still poll, the difference is lifecycle - Tone down operational overhead claims: customers still deploy and configure - Clarify long-running limitation applies to activities, not workflows Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: address review feedback on Evaluate pages - Reframe lifecycle description: Temporal invokes the Serverless Worker on demand (bchav, akhayam) - Clarify operational overhead: offload invocation and scaling, but deployments remain the user's responsibility (bchav) - Introduce "long-lived Workers" terminology and use consistently - Sharpen Lambda execution limit wording and Cloud Run callout (akhayam, bchav) - Remove Worker Versioning row from comparison table as too low-level (akhayam) - Remove --scaler-min-instances, --scaler-max-instances, and --ignore-missing-task-queues from demo CLI snippets (smuneebahmad) - Remove Min/Max Instances config fields from demo UI (smuneebahmad) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: Serverless Workers - Encyclopedia (4/4) (#4415) * docs: add Serverless Workers encyclopedia page and update related pages Add the encyclopedia entry for Serverless Workers, update workers.mdx and task-queues.mdx with serverless references, add the architecture diagram, update sidebar, and add "Serverless Worker" to Vale terms. Change onBrokenLinks/onBrokenAnchors to 'warn' to accommodate cross-references to pages in other PRs in this series. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove differences section from encyclopedia page The comparison table lives on the evaluate page now. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix Worker invocation wording in encyclopedia Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: address review feedback on encyclopedia page - Use "traditional long-lived Worker" on first contrast, "long-lived Worker" thereafter (akhayam, smuneebahmad) - Replace "triggers the compute environment" with "invokes the Serverless Worker on demand" (akhayam) - Use "shuts down" instead of "exits" to match AWS Lambda runtime terminology (akhayam) - Fix "serverless function" -> "Serverless Worker" for cross-provider accuracy - Use "invocation deadline" for consistency with other pages - Add Worker lifecycle section with new lifecycle diagram (addresses akhayam's suggestion to mirror AWS's Lambda lifecycle diagram) - Explain Worker stop timeout and shutdown deadline buffer, including tuning guidance for long-running Activities and the consequences of raising one knob without the other Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: crop lifecycle diagram Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add light/dark themed diagrams with figure captions Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update deploy guide with UI steps and real CLI flags Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update lifecycle dark diagram with lighter borders Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: tabify deploy approaches and drop UI positioning sentence Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: restructure deploy step with command-first layout and unified param table Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add ADOT layer reminder and long-running activity tuning link to Go SDK page Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add TLS customization example and execution role policy requirement Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: simplify execution role policy wording Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: inline TLS config in main worker code sample Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add TLS line * docs: Serverless Worker Autoscaling encyclopedia page (5/4) (#4460) * docs: add Serverless Worker Autoscaling encyclopedia page Add a dedicated page explaining how Temporal autoscales Serverless Workers on AWS Lambda, covering scaling signals (backlog + sync match rate), the push-based scaling flow, failure handling, and key constraints. Also adds an Autoscaling section to the existing Serverless Workers encyclopedia page linking to the new page, and a sidebar entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * merge autoscaling info with existing docs * copyedits * copyedits --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Lenny Chen <lenny.chen@temporal.io> * remove SDK from creating version * copyedits * docs: add describe-stacks command to retrieve role ARN from CloudFormation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add scaling with long-lived Workers section to serverless encyclopedia Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: self-hosted setup for serverless workers (#4476) * docs: add self-hosted setup page for serverless workers Covers enabling the Worker Controller via dynamic config, configuring AWS credentials for the Temporal server, and creating the Lambda invocation role with a CloudFormation template adapted for self-hosted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add describe-stacks command to retrieve role ARN Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add WCI verification steps and ARN discovery tip Add how to view WCI workflows using TemporalNamespaceDivision filter to both the self-hosted setup page and the deploy guide. Add aws sts get-caller-identity tip for finding the server's IAM ARN. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove WCI filter tip from deploy guide Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: mark IAM section as Cloud-only, link to self-hosted setup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add (Cloud only) to IAM heading Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move Cloud-only callout to top of IAM section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: address PR review feedback on self-hosted setup page - Remove first paragraph (Cloud comparison) - Add brief overview of the three setup steps - Use "Worker Controller Instance (WCI)" instead of "Worker Controller" - Move dynamic config reference table out (belongs on dynamic config page) - Replace verify section with Next steps linking to deploy guide Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * prettier * docs: reference self-hosted setup page in deploy guide prerequisites Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Apply suggestions from code review Co-authored-by: Stefan Richter <stefan@02strich.de> * Apply suggestions from code review Co-authored-by: Stefan Richter <stefan@02strich.de> * copyedits * copyedits --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Stefan Richter <stefan@02strich.de> * feat: serverless worker - typescript (#4468) * feat: serverless worker - typescript * docs: fix TS serverless worker review items - Add HOME=/tmp to TS create-function env vars - Fix makeOtelPlugins() to makeOtelPlugin() to match SDK export - Add versioning behavior requirement note Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: fix ADOT layer instructions and add X-Ray permissions note - Correct two-layer instruction to single ADOT Node.js layer (includes collector) - Add AWS_LAMBDA_EXEC_WRAPPER env var requirement - Add AWSXRayDaemonWriteAccess policy requirement with silent failure warning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: revert to two-layer ADOT setup, keep permissions note Revert to the original two-layer ADOT instruction from the SDK author. The single-layer setup was not verified end-to-end. Keep the AWSXRayDaemonWriteAccess and AWS_LAMBDA_EXEC_WRAPPER additions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add custom OTel collector config and setup instructions for TS The default ADOT collector config does not route OTLP data to the traces pipeline. Add the custom collector config YAML that wires OTLP to both traces (X-Ray) and metrics (CloudWatch EMF). Document the required env vars and IAM permissions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: fix OTel collector config to use logging exporter The standalone ADOT collector layer (v0.40.0) does not support the debug exporter. Replace with logging, which is the supported equivalent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update TS OTel to match sample (two layers, debug exporter) Match the TS sample exactly: - Two ADOT layers (JS layer + standalone collector) - debug exporter (not logging) - AWS_LAMBDA_EXEC_WRAPPER=/opt/otel-instrument - OPENTELEMETRY_COLLECTOR_CONFIG_URI (not _FILE) - tracing-config Mode=Active - IAM permissions for xray, cloudwatch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: fix AWS_LAMBDA_EXEC_WRAPPER to /opt/otel-handler for Node.js The Node.js ADOT layer ships /opt/otel-handler, not /opt/otel-instrument (which is the Python wrapper). Verified end-to-end: X-Ray traces confirmed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: python serverless worker (#4467) * feat: python serverless worker * docs: add versioning behavior samples, packaging fixes, heading cleanup - Add versioning behavior (PINNED) to Python and Go code samples - Add VersioningBehavior import from temporalio.common for Python - Fix Python packaging to zip deps first then add app files - Change Go samples from AutoUpgrade to Pinned - Add Python and TypeScript tabs to deploy guide - Clean up deploy guide headings (remove articles) - Add Serverless Worker link in deploy guide intro Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Revert "docs: add versioning behavior samples, packaging fixes, heading cleanup" This reverts commit 6fe5cc5. * docs: add Python versioning behavior samples and packaging fix - Add versioning behavior (PINNED) example to Python SDK page and deploy guide - Fix Python packaging: use --platform manylinux2014_x86_64 for Lambda - Zip deps first, then add app files separately Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: restructure Python SDK page for readability Highlight run_worker lines, explain configure callback right after the code block, move versioning behavior below. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: explain deployment name and build ID with links to versioning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Apply suggestions from code review Co-authored-by: Lenny Chen <55669665+lennessyy@users.noreply.github.com> * docs: add custom OTel collector config and setup instructions for Python The default ADOT collector config does not route OTLP data to the traces pipeline. Add the custom collector config YAML that wires OTLP to both traces (X-Ray) and metrics (CloudWatch EMF). Document the required env var and IAM permissions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: fix OTel collector config to use logging exporter The standalone ADOT collector layer (v0.40.0) does not support the debug exporter. Replace with logging, which is the supported equivalent. Verified end-to-end: traces now appear in X-Ray. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update Python OTel to use language-specific ADOT layer Use the ADOT Python layer (includes auto-instrumentation + collector) instead of standalone collector layer. Matches the verified sample setup. Use debug exporter (supported by the newer collector in the language layer). Add AWS_LAMBDA_EXEC_WRAPPER and tracing-config Mode=Active. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: fix bad merge --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * copyedits * clarify go sample otel instructions * copyedits * fix: darken interactive demo button in dark mode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update interactive demo * docs: use captioned image component * docs: address review feedback - versioning behavior, code fence, CaptionedImage - Mention worker-level default versioning behavior across all SDK pages - Fix missing Python OTel code fence - Update prerequisite to include worker-level default option - Replace ThemedImage with CaptionedImage on encyclopedia page - Add "not to scale" note to lifecycle diagram caption - Remove duplicate env vars sentence in deploy guide Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Python lambda-worker-otel optional dependency note Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: use JSON syntax for --environment to avoid parsing errors The shorthand Variables={} syntax fails when values contain colons or periods, which Temporal addresses always have. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: clarify IAM role and External ID in deploy guide - Clarify which role to use in create-version step (not execution role or user IAM role) - Fix External ID description: user-chosen, not provided by Temporal Cloud - Add Go config file resolution order to match Python and TS pages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add AWS namespace prerequisite for Lambda deploy guide Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * add links to quickstart * update execution role arn parameter table * docs: add links to samples * remove redundant sentence * use multi-line command in self-hosted guide * add validate connection step * uses pinned versioning behavior; uses snipsync to sync code snippets * add prerelease banners * add version requirements * docs: add Activity Heartbeat tip for long-running Activities on serverless (#4495) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add note about lambda versioning best practice (#4503) * docs: add note about lambda versioning best practice * fix: move admonition closing ::: to its own line The linter merged the ::: closing tag onto the preceding text line, which breaks MDX admonition rendering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * minor copy edit * indent admonition * Apply suggestions from code review Co-authored-by: Milecia McG <47196133+flippedcoder@users.noreply.github.com> * address review comments * Apply suggestions from code review Co-authored-by: Milecia McG <47196133+flippedcoder@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Milecia McG <47196133+flippedcoder@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Milecia McG <47196133+flippedcoder@users.noreply.github.com> * address feedback * Apply suggestions from code review Co-authored-by: Milecia McG <47196133+flippedcoder@users.noreply.github.com> * link to env config pages * troubleshooting: serverless workers (#4521) * troubleshooting: serverless workers * copy edits * reorder checks * update troubleshooting guide * add brief wci explanation * add link to troubleshooting docs * address feedback * clarify WCI role * minor copyedit * fix broken links * fix broken links * address adot issue * adjust demo contrast * small interactive demo polush * update demo sample to pinned * small polish --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Brandon Chavis <brandon.chavis@temporal.io> Co-authored-by: Stefan Richter <stefan@02strich.de> Co-authored-by: Milecia McG <47196133+flippedcoder@users.noreply.github.com>
What does this PR do?
Notes to reviewers
┆Attachments: EDU-6304 troubleshooting: serverless workers