Skip to content

DurableAgent: various compatibility fixes#1385

Open
VaguelySerious wants to merge 18 commits intomainfrom
peter/combined-durable-agent-fixes
Open

DurableAgent: various compatibility fixes#1385
VaguelySerious wants to merge 18 commits intomainfrom
peter/combined-durable-agent-fixes

Conversation

@VaguelySerious
Copy link
Member

@VaguelySerious VaguelySerious commented Mar 14, 2026

Closes #1303 : Support prepareStep on DurableAgent constructor with stream-level override
Closes #1302 : Add InferDurableAgentTools and InferDurableAgentUIMessage type helpers
Closes #1376 : Fix AI_DownloadError for image URLs by defaulting to no-op download
Closes #1296 : Add telemetry span support (ai.streamText.doStream, ai.toolCall) via optional @opentelemetry/api
Closes #848: Allow LanguageModelV3ToolResultOutput results to pass through without json parsin
Closes #975: Documents the necessity of sendStart: false when writing custom UIMessageChunks

Co-Authored-By: @nicoalbanese

… fix, telemetry spans

- Add InferDurableAgentTools and InferDurableAgentUIMessage type helpers (#1302)
- Support prepareStep on DurableAgent constructor with stream-level override (#1303)
- Fix AI_DownloadError for image URLs by defaulting to no-op download (#1376)
- Add telemetry span support (ai.streamText.doStream, ai.toolCall) via optional @opentelemetry/api (#1296)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@VaguelySerious VaguelySerious requested a review from a team as a code owner March 14, 2026 00:18
@changeset-bot
Copy link

changeset-bot bot commented Mar 14, 2026

🦋 Changeset detected

Latest commit: 094fe61

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@workflow/ai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Contributor

vercel bot commented Mar 14, 2026

@github-actions
Copy link
Contributor

github-actions bot commented Mar 14, 2026

🧪 E2E Test Results

Some tests failed

Summary

Passed Failed Skipped Total
✅ ▲ Vercel Production 747 0 67 814
✅ 💻 Local Development 770 0 118 888
✅ 📦 Local Production 770 0 118 888
✅ 🐘 Local Postgres 770 0 118 888
✅ 🪟 Windows 71 0 3 74
❌ 🌍 Community Worlds 116 55 15 186
✅ 📋 Other 195 0 27 222
Total 3439 55 466 3960

❌ Failed Tests

🌍 Community Worlds (55 failed)

mongodb (3 failed):

  • hookWorkflow is not resumable via public webhook endpoint
  • webhookWorkflow
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously

redis (2 failed):

  • hookWorkflow is not resumable via public webhook endpoint
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously

turso (50 failed):

  • addTenWorkflow
  • addTenWorkflow
  • wellKnownAgentWorkflow (.well-known/agent)
  • should work with react rendering in step
  • promiseAllWorkflow
  • promiseRaceWorkflow
  • promiseAnyWorkflow
  • importedStepOnlyWorkflow
  • hookWorkflow
  • hookWorkflow is not resumable via public webhook endpoint
  • webhookWorkflow
  • sleepingWorkflow
  • parallelSleepWorkflow
  • nullByteWorkflow
  • workflowAndStepMetadataWorkflow
  • fetchWorkflow
  • promiseRaceStressTestWorkflow
  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling error propagation workflow errors cross-file imports preserve message and stack trace
  • error handling error propagation step errors basic step error preserves message and stack trace
  • error handling error propagation step errors cross-file step error preserves message and function names in stack
  • error handling retry behavior regular Error retries until success
  • error handling retry behavior FatalError fails immediately without retries
  • error handling retry behavior RetryableError respects custom retryAfter delay
  • error handling retry behavior maxRetries=0 disables retries
  • error handling catchability FatalError can be caught and detected with FatalError.is()
  • hookCleanupTestWorkflow - hook token reuse after workflow completion
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously
  • hookDisposeTestWorkflow - hook token reuse after explicit disposal while workflow still running
  • stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars)
  • stepFunctionWithClosureWorkflow - step function with closure variables passed as argument
  • closureVariableWorkflow - nested step functions with closure variables
  • spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a step
  • health check (queue-based) - workflow and step endpoints respond to health check messages
  • pathsAliasWorkflow - TypeScript path aliases resolve correctly
  • Calculator.calculate - static workflow method using static step methods from another class
  • AllInOneService.processNumber - static workflow method using sibling static step methods
  • ChainableService.processWithThis - static step methods using this to reference the class
  • thisSerializationWorkflow - step function invoked with .call() and .apply()
  • customSerializationWorkflow - custom class serialization with WORKFLOW_SERIALIZE/WORKFLOW_DESERIALIZE
  • instanceMethodStepWorkflow - instance methods with "use step" directive
  • crossContextSerdeWorkflow - classes defined in step code are deserializable in workflow context
  • stepFunctionAsStartArgWorkflow - step function reference passed as start() argument
  • cancelRun - cancelling a running workflow
  • cancelRun via CLI - cancelling a running workflow
  • pages router addTenWorkflow via pages router
  • pages router promiseAllWorkflow via pages router
  • pages router sleepingWorkflow via pages router
  • hookWithSleepWorkflow - hook payloads delivered correctly with concurrent sleep
  • sleepWithSequentialStepsWorkflow - sequential steps work with concurrent sleep (control)

Details by Category

✅ ▲ Vercel Production
App Passed Failed Skipped
✅ astro 67 0 7
✅ example 67 0 7
✅ express 67 0 7
✅ fastify 67 0 7
✅ hono 67 0 7
✅ nextjs-turbopack 72 0 2
✅ nextjs-webpack 72 0 2
✅ nitro 67 0 7
✅ nuxt 67 0 7
✅ sveltekit 67 0 7
✅ vite 67 0 7
✅ 💻 Local Development
App Passed Failed Skipped
✅ astro-stable 65 0 9
✅ express-stable 65 0 9
✅ fastify-stable 65 0 9
✅ hono-stable 65 0 9
✅ nextjs-turbopack-canary 54 0 20
✅ nextjs-turbopack-stable 71 0 3
✅ nextjs-webpack-canary 54 0 20
✅ nextjs-webpack-stable 71 0 3
✅ nitro-stable 65 0 9
✅ nuxt-stable 65 0 9
✅ sveltekit-stable 65 0 9
✅ vite-stable 65 0 9
✅ 📦 Local Production
App Passed Failed Skipped
✅ astro-stable 65 0 9
✅ express-stable 65 0 9
✅ fastify-stable 65 0 9
✅ hono-stable 65 0 9
✅ nextjs-turbopack-canary 54 0 20
✅ nextjs-turbopack-stable 71 0 3
✅ nextjs-webpack-canary 54 0 20
✅ nextjs-webpack-stable 71 0 3
✅ nitro-stable 65 0 9
✅ nuxt-stable 65 0 9
✅ sveltekit-stable 65 0 9
✅ vite-stable 65 0 9
✅ 🐘 Local Postgres
App Passed Failed Skipped
✅ astro-stable 65 0 9
✅ express-stable 65 0 9
✅ fastify-stable 65 0 9
✅ hono-stable 65 0 9
✅ nextjs-turbopack-canary 54 0 20
✅ nextjs-turbopack-stable 71 0 3
✅ nextjs-webpack-canary 54 0 20
✅ nextjs-webpack-stable 71 0 3
✅ nitro-stable 65 0 9
✅ nuxt-stable 65 0 9
✅ sveltekit-stable 65 0 9
✅ vite-stable 65 0 9
✅ 🪟 Windows
App Passed Failed Skipped
✅ nextjs-turbopack 71 0 3
❌ 🌍 Community Worlds
App Passed Failed Skipped
✅ mongodb-dev 3 0 2
❌ mongodb 51 3 3
✅ redis-dev 3 0 2
❌ redis 52 2 3
✅ turso-dev 3 0 2
❌ turso 4 50 3
✅ 📋 Other
App Passed Failed Skipped
✅ e2e-local-dev-nest-stable 65 0 9
✅ e2e-local-postgres-nest-stable 65 0 9
✅ e2e-local-prod-nest-stable 65 0 9

📋 View full workflow run

@github-actions
Copy link
Contributor

github-actions bot commented Mar 14, 2026

📊 Benchmark Results

📈 Comparing against baseline from main branch. Green 🟢 = faster, Red 🔺 = slower.

workflow with no steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Nitro 0.039s (-8.7% 🟢) 1.006s (~) 0.967s 10 1.00x
💻 Local Express 0.046s (+2.0%) 1.005s (-2.0%) 0.960s 10 1.18x
💻 Local Next.js (Turbopack) 0.049s 1.005s 0.956s 10 1.27x
🌐 Redis Next.js (Turbopack) 0.054s 1.005s 0.952s 10 1.39x
🐘 Postgres Next.js (Turbopack) 0.061s 1.011s 0.950s 10 1.57x
🐘 Postgres Express 0.063s (-1.1%) 1.012s (~) 0.949s 10 1.62x
🐘 Postgres Nitro 0.067s (+7.7% 🔺) 1.017s (+0.5%) 0.950s 10 1.74x
🌐 MongoDB Next.js (Turbopack) 0.132s 1.007s 0.876s 10 3.40x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Next.js (Turbopack) 0.432s (-54.7% 🟢) 2.416s (-18.6% 🟢) 1.985s 10 1.00x
▲ Vercel Express 0.473s (+1.2%) 2.454s (-10.0% 🟢) 1.980s 10 1.10x
▲ Vercel Nitro 0.486s (-28.1% 🟢) 2.572s (+3.6%) 2.085s 10 1.13x

🔍 Observability: Next.js (Turbopack) | Express | Nitro

workflow with 1 step

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Nitro 1.097s (-2.4%) 2.006s (~) 0.909s 10 1.00x
💻 Local Next.js (Turbopack) 1.116s 2.005s 0.889s 10 1.02x
🌐 Redis Next.js (Turbopack) 1.132s 2.007s 0.875s 10 1.03x
💻 Local Express 1.135s (+1.0%) 2.007s (~) 0.871s 10 1.04x
🐘 Postgres Next.js (Turbopack) 1.146s 2.012s 0.866s 10 1.05x
🐘 Postgres Express 1.146s (~) 2.011s (~) 0.865s 10 1.05x
🐘 Postgres Nitro 1.147s (~) 2.013s (~) 0.865s 10 1.05x
🌐 MongoDB Next.js (Turbopack) 1.306s 2.009s 0.703s 10 1.19x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Next.js (Turbopack) 2.024s (-5.8% 🟢) 3.561s (-6.7% 🟢) 1.537s 10 1.00x
▲ Vercel Express 2.054s (-1.0%) 3.521s (-3.3%) 1.467s 10 1.01x
▲ Vercel Nitro 2.169s (-16.8% 🟢) 3.862s (-17.8% 🟢) 1.693s 10 1.07x

🔍 Observability: Next.js (Turbopack) | Express | Nitro

workflow with 10 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Nitro 10.630s (-2.5%) 11.024s (~) 0.394s 3 1.00x
💻 Local Next.js (Turbopack) 10.802s 11.023s 0.221s 3 1.02x
🌐 Redis Next.js (Turbopack) 10.812s 11.023s 0.211s 3 1.02x
🐘 Postgres Next.js (Turbopack) 10.836s 11.037s 0.202s 3 1.02x
💻 Local Express 10.902s (~) 11.024s (~) 0.122s 3 1.03x
🐘 Postgres Express 10.932s (~) 11.038s (~) 0.106s 3 1.03x
🐘 Postgres Nitro 10.939s (~) 11.039s (~) 0.100s 3 1.03x
🌐 MongoDB Next.js (Turbopack) 12.271s 13.022s 0.751s 3 1.15x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 16.422s (-3.1%) 17.403s (-4.7%) 0.981s 2 1.00x
▲ Vercel Nitro 16.722s (-4.4%) 18.741s (-7.7% 🟢) 2.019s 2 1.02x
▲ Vercel Next.js (Turbopack) 18.961s (+8.2% 🔺) 20.762s (+8.0% 🔺) 1.801s 2 1.15x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

workflow with 25 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Nitro 26.775s (-2.5%) 27.053s (-3.6%) 0.278s 3 1.00x
🌐 Redis Next.js (Turbopack) 26.787s 27.052s 0.265s 3 1.00x
🐘 Postgres Next.js (Turbopack) 27.016s 27.395s 0.379s 3 1.01x
🐘 Postgres Express 27.132s (~) 27.727s (-1.2%) 0.595s 3 1.01x
🐘 Postgres Nitro 27.235s (~) 28.063s (~) 0.828s 3 1.02x
💻 Local Next.js (Turbopack) 27.255s 28.054s 0.799s 3 1.02x
💻 Local Express 27.513s (~) 28.053s (~) 0.540s 3 1.03x
🌐 MongoDB Next.js (Turbopack) 30.389s 31.037s 0.649s 2 1.13x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Next.js (Turbopack) 45.154s (+2.7%) 47.613s (+2.5%) 2.459s 2 1.00x
▲ Vercel Express 45.218s (+3.1%) 46.945s (+2.1%) 1.727s 2 1.00x
▲ Vercel Nitro 45.293s (+4.1%) 46.791s (+2.7%) 1.498s 2 1.00x

🔍 Observability: Next.js (Turbopack) | Express | Nitro

workflow with 50 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🌐 Redis 🥇 Next.js (Turbopack) 53.414s 54.097s 0.683s 2 1.00x
🐘 Postgres Next.js (Turbopack) 53.876s 54.094s 0.218s 2 1.01x
🐘 Postgres Nitro 54.282s (~) 55.097s (~) 0.815s 2 1.02x
🐘 Postgres Express 54.301s (~) 55.100s (~) 0.799s 2 1.02x
💻 Local Nitro 54.970s (-3.0%) 55.107s (-3.5%) 0.137s 2 1.03x
💻 Local Next.js (Turbopack) 55.942s 56.100s 0.158s 2 1.05x
💻 Local Express 56.868s (~) 57.102s (~) 0.234s 2 1.06x
🌐 MongoDB Next.js (Turbopack) 60.595s 61.067s 0.472s 2 1.13x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Next.js (Turbopack) 93.323s (-3.2%) 94.121s (-4.5%) 0.798s 1 1.00x
▲ Vercel Express 94.753s (+2.1%) 97.186s (+3.1%) 2.433s 1 1.02x
▲ Vercel Nitro 97.185s (+1.5%) 98.493s (+0.6%) 1.308s 1 1.04x

🔍 Observability: Next.js (Turbopack) | Express | Nitro

Promise.all with 10 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🌐 Redis 🥇 Next.js (Turbopack) 1.395s 2.006s 0.612s 15 1.00x
🐘 Postgres Nitro 1.406s (-2.5%) 2.010s (~) 0.605s 15 1.01x
🐘 Postgres Express 1.426s (-3.8%) 2.010s (~) 0.584s 15 1.02x
🐘 Postgres Next.js (Turbopack) 1.448s 2.011s 0.563s 15 1.04x
💻 Local Nitro 1.503s (+0.8%) 2.005s (~) 0.502s 15 1.08x
💻 Local Next.js (Turbopack) 1.539s 2.005s 0.466s 15 1.10x
💻 Local Express 1.564s (+5.7% 🔺) 2.006s (~) 0.442s 15 1.12x
🌐 MongoDB Next.js (Turbopack) 2.159s 3.008s 0.850s 10 1.55x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 2.150s (-6.1% 🟢) 3.703s (-5.5% 🟢) 1.553s 9 1.00x
▲ Vercel Next.js (Turbopack) 2.251s (-10.5% 🟢) 4.018s (-1.9%) 1.768s 8 1.05x
▲ Vercel Nitro 2.305s (-14.9% 🟢) 3.938s (-10.2% 🟢) 1.633s 8 1.07x

🔍 Observability: Express | Next.js (Turbopack) | Nitro

Promise.all with 25 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Express 2.566s (-1.4%) 3.013s (~) 0.447s 10 1.00x
🌐 Redis Next.js (Turbopack) 2.582s 3.009s 0.427s 10 1.01x
🐘 Postgres Nitro 2.595s (~) 3.012s (~) 0.417s 10 1.01x
💻 Local Nitro 2.637s (-7.7% 🟢) 3.007s (-10.0% 🟢) 0.370s 10 1.03x
💻 Local Next.js (Turbopack) 2.751s 3.107s 0.356s 10 1.07x
🐘 Postgres Next.js (Turbopack) 2.759s 3.213s 0.454s 10 1.08x
💻 Local Express 2.978s (+0.7%) 3.564s (+7.7% 🔺) 0.586s 9 1.16x
🌐 MongoDB Next.js (Turbopack) 4.670s 5.177s 0.507s 6 1.82x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 2.567s (-4.4%) 4.252s (+2.1%) 1.685s 8 1.00x
▲ Vercel Express 2.843s (+12.8% 🔺) 3.944s (-10.1% 🟢) 1.100s 8 1.11x
▲ Vercel Next.js (Turbopack) 2.890s (+8.4% 🔺) 4.524s (-2.4%) 1.635s 7 1.13x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

Promise.all with 50 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Nitro 3.916s (-1.8%) 4.444s (-6.1% 🟢) 0.528s 7 1.00x
🐘 Postgres Express 4.020s (+0.7%) 4.446s (~) 0.426s 7 1.03x
🌐 Redis Next.js (Turbopack) 4.133s 5.013s 0.880s 6 1.06x
🐘 Postgres Next.js (Turbopack) 4.540s 5.017s 0.477s 6 1.16x
💻 Local Nitro 7.038s (-13.8% 🟢) 7.416s (-17.8% 🟢) 0.378s 5 1.80x
💻 Local Next.js (Turbopack) 7.244s 7.516s 0.273s 4 1.85x
💻 Local Express 8.511s (+4.4%) 9.022s (+2.9%) 0.510s 4 2.17x
🌐 MongoDB Next.js (Turbopack) 9.758s 10.348s 0.590s 3 2.49x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 2.898s (-15.7% 🟢) 4.312s (-16.1% 🟢) 1.413s 7 1.00x
▲ Vercel Nitro 3.676s (+20.1% 🔺) 5.290s (+15.7% 🔺) 1.614s 6 1.27x
▲ Vercel Next.js (Turbopack) 3.739s (-1.7%) 5.321s (-3.6%) 1.583s 6 1.29x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

Promise.race with 10 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🌐 Redis 🥇 Next.js (Turbopack) 1.330s 2.007s 0.676s 15 1.00x
🐘 Postgres Nitro 1.407s (-1.0%) 2.010s (~) 0.603s 15 1.06x
🐘 Postgres Next.js (Turbopack) 1.437s 2.011s 0.574s 15 1.08x
💻 Local Nitro 1.477s (-3.6%) 2.005s (~) 0.528s 15 1.11x
🐘 Postgres Express 1.496s (+8.1% 🔺) 2.077s (+3.3%) 0.581s 15 1.12x
💻 Local Next.js (Turbopack) 1.531s 2.005s 0.475s 15 1.15x
💻 Local Express 1.539s (+0.5%) 2.005s (~) 0.466s 15 1.16x
🌐 MongoDB Next.js (Turbopack) 2.196s 3.009s 0.813s 10 1.65x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 2.185s (-16.2% 🟢) 3.438s (-22.8% 🟢) 1.253s 9 1.00x
▲ Vercel Next.js (Turbopack) 2.375s (+6.5% 🔺) 3.752s (~) 1.376s 8 1.09x
▲ Vercel Nitro 2.386s (+1.1%) 4.575s (+0.6%) 2.189s 7 1.09x

🔍 Observability: Express | Next.js (Turbopack) | Nitro

Promise.race with 25 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Nitro 2.484s (-6.7% 🟢) 3.010s (-3.3%) 0.526s 10 1.00x
🌐 Redis Next.js (Turbopack) 2.560s 3.008s 0.448s 10 1.03x
🐘 Postgres Next.js (Turbopack) 2.564s 3.012s 0.448s 10 1.03x
🐘 Postgres Express 2.609s (+3.7%) 3.012s (~) 0.403s 10 1.05x
💻 Local Nitro 2.707s (-11.3% 🟢) 3.009s (-22.6% 🟢) 0.302s 10 1.09x
💻 Local Express 3.041s (+1.7%) 3.886s (+9.0% 🔺) 0.845s 8 1.22x
💻 Local Next.js (Turbopack) 3.070s 3.759s 0.689s 8 1.24x
🌐 MongoDB Next.js (Turbopack) 4.662s 5.177s 0.516s 6 1.88x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 2.259s (-18.2% 🟢) 3.694s (-9.9% 🟢) 1.435s 9 1.00x
▲ Vercel Nitro 2.624s (+9.4% 🔺) 4.394s (+8.4% 🔺) 1.770s 7 1.16x
▲ Vercel Next.js (Turbopack) 2.669s (+6.8% 🔺) 4.250s (+6.8% 🔺) 1.582s 8 1.18x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

Promise.race with 50 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Express 3.911s (-1.1%) 4.590s (~) 0.679s 7 1.00x
🐘 Postgres Nitro 3.950s (-3.3%) 4.588s (+3.2%) 0.638s 7 1.01x
🌐 Redis Next.js (Turbopack) 4.255s 5.011s 0.756s 6 1.09x
🐘 Postgres Next.js (Turbopack) 4.556s 5.021s 0.466s 6 1.16x
💻 Local Nitro 7.397s (-20.4% 🟢) 8.019s (-20.0% 🟢) 0.622s 4 1.89x
💻 Local Next.js (Turbopack) 8.347s 8.767s 0.420s 4 2.13x
💻 Local Express 8.882s (+1.7%) 9.270s (+2.8%) 0.388s 4 2.27x
🌐 MongoDB Next.js (Turbopack) 9.824s 10.347s 0.523s 3 2.51x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 2.719s (-1.6%) 4.485s (+10.9% 🔺) 1.767s 7 1.00x
▲ Vercel Nitro 3.158s (+18.9% 🔺) 4.745s (+21.7% 🔺) 1.587s 7 1.16x
▲ Vercel Next.js (Turbopack) 3.199s (-10.0% 🟢) 5.271s (-3.5%) 2.073s 6 1.18x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

Stream Benchmarks (includes TTFB metrics)
workflow with stream

💻 Local Development

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Nitro 0.138s (-30.2% 🟢) 1.003s (~) 0.009s (-18.8% 🟢) 1.016s (~) 0.877s 10 1.00x
🌐 Redis Next.js (Turbopack) 0.175s 1.000s 0.001s 1.008s 0.833s 10 1.26x
💻 Local Next.js (Turbopack) 0.178s 1.001s 0.011s 1.017s 0.839s 10 1.29x
🐘 Postgres Next.js (Turbopack) 0.198s 1.002s 0.001s 1.013s 0.815s 10 1.43x
💻 Local Express 0.207s (+6.1% 🔺) 1.003s (~) 0.012s (+7.0% 🔺) 1.019s (~) 0.812s 10 1.49x
🐘 Postgres Nitro 0.213s (-3.7%) 0.997s (~) 0.002s (+15.4% 🔺) 1.012s (~) 0.799s 10 1.54x
🐘 Postgres Express 0.218s (-1.6%) 0.993s (~) 0.002s (+25.0% 🔺) 1.012s (~) 0.794s 10 1.58x
🌐 MongoDB Next.js (Turbopack) 0.524s 0.929s 0.001s 1.009s 0.485s 10 3.78x

▲ Production (Vercel)

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Next.js (Turbopack) 1.689s (-1.6%) 2.686s (-1.1%) 0.005s (~) 3.325s (-5.4% 🟢) 1.636s 10 1.00x
▲ Vercel Express 1.716s (-9.3% 🟢) 2.758s (+21.0% 🔺) 0.010s (+60.9% 🔺) 3.369s (+6.1% 🔺) 1.653s 10 1.02x
▲ Vercel Nitro 1.774s (+11.2% 🔺) 2.720s (+1.0%) 0.005s (-96.4% 🟢) 3.297s (-3.6%) 1.523s 10 1.05x

🔍 Observability: Next.js (Turbopack) | Express | Nitro

Summary

Fastest Framework by World

Winner determined by most benchmark wins

World 🥇 Fastest Framework Wins
💻 Local Nitro 12/12
🐘 Postgres Next.js (Turbopack) 6/12
▲ Vercel Express 6/12
Fastest World by Framework

Winner determined by most benchmark wins

Framework 🥇 Fastest World Wins
Express 🐘 Postgres 5/12
Next.js (Turbopack) 🌐 Redis 7/12
Nitro 💻 Local 5/12
Column Definitions
  • Workflow Time: Runtime reported by workflow (completedAt - createdAt) - primary metric
  • TTFB: Time to First Byte - time from workflow start until first stream byte received (stream benchmarks only)
  • Slurp: Time from first byte to complete stream consumption (stream benchmarks only)
  • Wall Time: Total testbench time (trigger workflow + poll for result)
  • Overhead: Testbench overhead (Wall Time - Workflow Time)
  • Samples: Number of benchmark iterations run
  • vs Fastest: How much slower compared to the fastest configuration for this benchmark

Worlds:

  • 💻 Local: In-memory filesystem world (local development)
  • 🐘 Postgres: PostgreSQL database world (local development)
  • ▲ Vercel: Vercel production/preview deployment
  • 🌐 Turso: Community world (local development)
  • 🌐 MongoDB: Community world (local development)
  • 🌐 Redis: Community world (local development)
  • 🌐 Jazz: Community world (local development)

📋 View full workflow run

Signed-off-by: Peter Wielander <mittgfu@gmail.com>
- Use space separator in operation.name (not dot) to match AI SDK
- Add resource.name attribute from functionId
- Use recordException() + setStatus() for error spans (preserves stack traces)
- Add context.with() for proper parent-child span propagation
- Make recordSpan self-initializing (remove initTelemetry two-phase requirement)
- Add recordInputs/recordOutputs to TelemetrySettings for sensitive data gating
- Add gen_ai.* semantic convention attributes to doStream span
- Add ai.toolCall.args gated by recordOutputs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
VaguelySerious and others added 2 commits March 16, 2026 14:16
- Pass through LanguageModelV3ToolResultOutput from tool execute when the
  result is already a valid output type (text, json, content, error-text,
  error-json, execution-denied). This enables tools to return multimodal
  content (images, files) that models can process via vision. (#848)
- Improve sendStart JSDoc to document the messageId conflict when writing
  custom chunks before agent.stream(), guiding users to set sendStart:
  false in that scenario. (#975)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Convert contextStorage AsyncLocalStorage from module-scoped to a
globalThis singleton using Symbol.for(). This prevents esbuild's
module scope duplication from creating separate ALS instances, which
caused AsyncLocalStorage.getStore() to return undefined inside tool
execute() callbacks.

Cherry-picked from 8ea3f89 (peter/v2-flow), excluding unrelated
v2-flow runtime changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
VaguelySerious and others added 5 commits March 16, 2026 15:52
Signed-off-by: Peter Wielander <mittgfu@gmail.com>
Signed-off-by: Peter Wielander <mittgfu@gmail.com>
Next.js 16.2.0-canary.100+ has a regression where @workflow/ai step
files are missing from the step bundle, causing "doStreamStep not
found" errors that hang the agent tests until timeout.

Signed-off-by: Peter Wielander <mittgfu@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DEV_TEST_CONFIG was only set in the dev test job, so prod and postgres
canary jobs didn't skip agent tests. Add NEXT_CANARY env var to all
three local e2e test jobs (dev, prod, postgres) and use it directly.

Signed-off-by: Peter Wielander <mittgfu@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Collaborator

@karthikscale3 karthikscale3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me.

  • o11y reporting matches 1:1 to aisdk
  • prepareStep addition looks good

Copy link
Collaborator

@pranaygp pranaygp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should really update the e2e tests and the DurableAgent/ToolLoopAgent integration compat tests to prevent regressions here and to validate tat the features actually work

thank you so much!!

VaguelySerious and others added 5 commits March 16, 2026 17:07
- agentConstructorPrepareStepE2e: verifies agent-level prepareStep is
  called for each LLM step through the full workflow runtime
- agentStreamPrepareStepOverrideE2e: verifies stream-level prepareStep
  overrides constructor-level
- agentMultimodalToolResultE2e: verifies tools returning
  LanguageModelV3ToolResultOutput (type: 'content') pass through the
  serialization/deserialization boundary correctly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add three tests to the ToolLoopAgent compat suite verifying:
- Constructor-level prepareStep is called on each LLM step
- Stream-level prepareStep overrides constructor-level
- Constructor prepareStep fires for every step in multi-step tool loops

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
prepareStep: ({ stepNumber }) => {
stepNumbers.push(stepNumber);
return {};
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if prepareStep is typically used as a step function, or used inline in the workflow context itself?

I did a quick scan on AI sdk docs and looks like your example is exactly what it's typically used for :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no action needed - just curious

{ type: 'text', text: 'Here is the image' },
{ type: 'file-data', data: 'iVBORw0KGgo=', mediaType: 'image/png' },
],
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling#multi-modal-tool-results

nit: should we actually use toModelOutput? looks like that's the intended way to use it since tool authors typically wouldn't write a tool to return in the model message format, but would use the converter

just to make the e2e tests more real? claude, you might want to validate that ToolLoopAgent also propagates this toModelOutput since that example was taken from generateText

Copy link
Collaborator

@pranaygp pranaygp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - thanks for adding the tests. left some comments :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants