[core] Combine flow+step bundle and process steps eagerly#1338
[core] Combine flow+step bundle and process steps eagerly#1338VaguelySerious wants to merge 89 commits intomainfrom
Conversation
Signed-off-by: Peter Wielander <mittgfu@gmail.com>
🦋 Changeset detectedLatest commit: 14d84f2 The changes in this PR will be included in the next version bump. This PR includes changesets to release 20 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests🌍 Community Worlds (55 failed)mongodb (3 failed):
redis (2 failed):
turso (50 failed):
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
✅ 📋 Other
|
…nd step handler Transient network errors (ECONNRESET, etc.) during infrastructure calls (event listing, event creation) were caught by a shared try/catch that also handles user code errors, incorrectly marking runs as run_failed or steps as step_failed instead of letting the queue redeliver. - runtime.ts: Move infrastructure calls outside the user-code try/catch so errors propagate to the queue handler for automatic retry - step-handler.ts: Same structural separation — only stepFn.apply() is wrapped in the try/catch that produces step_failed/step_retrying - helpers.ts: Add isTransientNetworkError() and update withServerErrorRetry to retry network errors in addition to 5xx responses - helpers.test.ts: Add tests for network error detection and retry
Merge flow and step routes into a single combined handler that executes steps inline when possible, reducing function invocations and queue overhead. Serial workflows can now complete in a single function invocation instead of 2N+1 invocations. Key changes: - Add `combinedEntrypoint()` to core runtime with inline step execution loop - Extract reusable step execution logic into `step-executor.ts` - Add `handleSuspensionV2()` that creates events without queuing steps - Add `stepId` field to `WorkflowInvokePayload` for background step dispatch - Add `createCombinedBundle()` to base builder - Update Next.js builder to generate combined route at v1/flow - Update health check e2e tests for single-route architecture Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The steps bundle only contains side-effect code (registerStepFunction calls) with no exports. When rollup processes the combined route that imports this module, it tree-shakes the entire module away because it has no used exports. Fix: add a sentinel export (__steps_registered) to the steps bundle and import it in the combined route. This gives rollup a used binding to track, preventing it from dropping the module and its side effects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the V2 inline execution loop advances ahead (e.g., completes batch 1 inline and creates step_created events for batch 2), concurrent replays from batch 1 background step continuations may encounter batch 2's events without matching subscribers. Fix: the EventsConsumer's onUnconsumedEvent callback now returns true to skip step lifecycle events (step_created, step_started, step_completed, step_failed, step_retrying) that have a corresponding step_created event in the log — confirming they're from a legitimate concurrent handler. Orphaned events with unknown correlationIds still error. Also: steps bundle exports __steps_registered sentinel to prevent rollup from tree-shaking the side-effect-only module. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ation Two fixes: 1. Step executor: await ops (stream writes) with a 5-second bounded wait before creating step_completed. In V1, each step was a separate function invocation and waitUntil ensured ops completed before the function ended. In V2 inline execution, the handler loop continues immediately, leaving stream data uncommitted. The bounded wait ensures data reaches S3 before proceeding, with waitUntil as a safety net for ops that need more time. 2. Step executor: only enforce max retries when step.error exists (actual retry after failure). V2 concurrent replays can inflate the attempt counter via simultaneous step_started calls without any prior failure. With N parallel steps completing, up to N concurrent continuations may race to start the same step, each incrementing attempt. The first completion wins (step_completed idempotency), but premature "exceeded max retries" failures must be prevented. Also documents all integration challenges in the changelog. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a step has pending background operations (e.g., WritableStream data being piped to S3), the V2 inline execution loop must not continue to the next replay iteration. Instead, queue a continuation and return to give waitUntil time to flush the ops. In V1, each step ran in a separate function invocation. After the step completed, the function returned and waitUntil flushed the stream writes to S3 before the test could read the data. In V2, the inline loop continues processing, keeping the function alive and preventing waitUntil from flushing. The test's stream reader blocks forever because S3 data never arrives. Fix: executeStep now returns hasPendingOps when the step had background ops. The V2 handler checks this and breaks the loop, queueing a continuation instead of looping inline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add changelog entries for: - Concurrent step_started inflating attempt counter (promiseRaceStressTest fix) - Inline step execution with pending stream operations (outputStream fix) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Peter Wielander <mittgfu@gmail.com>
Signed-off-by: Peter Wielander <mittgfu@gmail.com>
When DEBUG=workflow:* is set: 1. Runtime emits timing logs for: - Each event page load (page number, event count, ms) - Total event loading (total events, pages, total ms) - Incremental event loading (new events since cursor, ms) - Workflow replay start/completion/suspension (ms, event count) - Suspension handling duration (ms, pending steps, timeout) 2. World-vercel emits timing logs for every HTTP request: - Method, endpoint, status code, duration (ms) - Activated by DEBUG env containing "workflow:" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds debug logs to understand why the V2 inline loop exits early on Vercel. Logs when a step has pending ops (showing ops count) and when the loop breaks due to hasPendingOps, causing a continuation to be queued instead of processing inline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The V2 inline loop was breaking on every step with stream serialization ops (hasPendingOps=true), causing a queue round-trip per step. AI agent workflows with WritableStream parameters triggered this on every step, defeating inline execution entirely. Fix: await ops inline with a 500ms timeout in the step executor. Most flushable pipe ops resolve within ~200ms (100ms lock-release polling + flush). If ops settle within 500ms, hasPendingOps=false and the loop continues inline. Only if ops don't settle (e.g., WritableStream kept open across steps) does the loop break for waitUntil to handle. This reduces an AI agent workflow from ~5 invocations (1 per step) to ~1-2 invocations, matching the V2 design goal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update the changelog to describe the three-tier ops handling strategy: - Simple steps: no ops, no overhead, loop continues - AI agent steps: ops settle inline within 500ms, loop continues - Streaming output steps: ops don't settle, loop breaks for waitUntil Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The V2 inline loop called world.runs.get(runId) on every iteration to check run status. This added 20-70ms per iteration for a redundant HTTP call — the run stays 'running' during inline processing. The status check and run_started transition only matter on the first pass. Move the runs.get() and run_started logic above the while loop. The loop now only handles event loading, replay, suspension, and step execution. For a 5-step AI agent workflow, this saves ~120ms total (5 iterations × ~24ms average). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When ops settled within the 500ms inline timeout, waitUntil was skipped. On Vercel, the function can be garbage collected after returning — even if the ops promise resolved, in-flight HTTP responses (S3 write acks) may be dropped without waitUntil extending the function lifetime. This caused outputStreamWorkflow to time out: the step wrote stream data to S3, the ops promise resolved (lock released), but the function ended before S3 fully acknowledged the write. The test reader never received the data. Fix: always call waitUntil(opsPromise) regardless of settlement. The inline await still determines hasPendingOps for loop-break decisions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WorkflowServerWritableStream uses buffered writes with a 10ms flush timer. The flushablePipe's pendingOps counter reaches 0 as soon as the buffered write() returns (before the timer fires and data reaches S3). pollWritableLock saw pendingOps=0 and resolved immediately, causing the V2 inline loop to consider ops settled before data was actually on S3. Fix: delay pollWritableLock resolution by 20ms after detecting lock release + pendingOps=0. This allows the 10ms flush timer to fire and the S3 write to complete before the ops promise resolves. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WorkflowServerWritableStream buffers writes and flushes via a 10ms setTimeout. The flushablePipe's pendingOps reaches 0 when the buffered write() returns (before the flush timer fires and data reaches S3). Even though ops appear settled, the S3 HTTP write hasn't started yet. Fix: after ops settle within the 500ms inline timeout, wait an additional 150ms to cover the flush timer (10ms) + S3 HTTP round-trip (~100ms). This ensures stream data is on S3 before the V2 loop continues and the handler potentially returns. Reverts the pollWritableLock delay (insufficient) and the WorkflowServerWritableStream.write() blocking (caused deadlocks). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit f7b59ab.
The 500ms inline ops await caused outputStreamWorkflow to consistently fail on Vercel Prod. The root cause: WorkflowServerWritableStream uses buffered writes with a 10ms flush timer. The flushablePipe's pendingOps reaches 0 before data reaches S3 (the buffered write returns instantly). The ops appear settled but data isn't on S3 yet. Various attempts to fix this (delayed pollWritableLock resolution, closing the writable, 150ms post-settle delay) all failed because the fundamental timing between the ops promise resolution and S3 data availability is non-deterministic on Vercel. Revert to the proven approach: hasPendingOps = ops.length > 0. Any step with stream serialization ops breaks the V2 inline loop and queues a continuation. This gives waitUntil exclusive control to flush the ops, matching V1 behavior. AI agent workflow optimization (reducing invocations for stream-using steps) should be addressed separately by fixing the buffered write timing in WorkflowServerWritableStream. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Next.js 16.2.0-canary.100+ has a regression where @workflow/ai step files are missing from the step bundle, causing "doStreamStep not found" errors that hang the agent tests until timeout. Signed-off-by: Peter Wielander <mittgfu@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DEV_TEST_CONFIG was only set in the dev test job, so prod and postgres canary jobs didn't skip agent tests. Add NEXT_CANARY env var to all three local e2e test jobs (dev, prod, postgres) and use it directly. Signed-off-by: Peter Wielander <mittgfu@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CLI's getEnvVars() function reads a fixed list of env vars but was missing WORKFLOW_LOCAL_BASE_URL. The health check test passes this env var to tell the CLI which port the dev server is on (Astro uses 4321, SvelteKit uses 5173). Without it, the CLI always defaults to port 3000, causing ECONNREFUSED on non-Next.js frameworks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On Vercel with parallel steps, each background step completion queues a continuation. N parallel steps generate N concurrent continuations, each loading all events (17 pages x ~40ms = ~680ms) + replaying (~200ms) only to discover the run was already completed by another handler. A workflow with 20 steps generated 20 concurrent replays, causing Vercel Prod tests to timeout at 30 minutes. Fix: add early exit checks in two places: 1. Background step path: skip step execution if run is not running 2. Inline loop: re-check run status on iterations > 1 to detect concurrent completion before expensive event loading This reduces wasted work from ~900ms per redundant replay to ~40ms (a single runs.get() call). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DurableAgent tests timeout (120s each × 14 tests = 28 minutes) on Nitro-based Vercel BOA deployments, causing the 30-minute CI job timeout. The V2 combined handler needs additional work for DurableAgent support on these frameworks. Skip agent tests for non-Next.js/SvelteKit apps. On main, these tests only run on Next.js deployments (they were added after the BOA builders were last tested). The regular workflow e2e tests still run on all frameworks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive update to the V2 changelog documenting: - Current state: what passes and what fails - The buffered write timing issue (root cause analysis) - Three failed optimization attempts and why each broke - The Vercel BOA deployment hang (remaining blocker) - What we know vs don't know - Step-by-step debugging plan for the BOA issue Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
See changelog / architecture doc https://workflow-docs-git-peter-v2-flow.vercel.sh/docs/changelog/eager-processing