QVAC-18183 feat[api]: cancel capability + per-handler cancel scope + structured logging by simon-iribarren · Pull Request #2036 · tetherto/qvac

simon-iribarren · 2026-05-13T19:59:58Z

Wider context. This PR lands the framework primitives for QVAC-18183 — Per-model concurrency policy and uniform handler/worker authoring. The previous epic milestone — QVAC-18182 — Typed cancel outcomes + KvCacheSession — landed in #2007. Subsequent handler-migration PRs continue against the same QVAC-18183 ticket.

🎯 What problem does this PR solve?

"What does cancel do for this handler" was implicit. Every cancel path through RequestRegistry worked after QVAC-18182 feat[api]: typed cancel outcomes on the wire + atomic KV-cache via KvCacheSession #2007, but the addon-side truth — does this handler accept a requestId, does it cancel at the model level, or is it un-cancellable? — lived in handler bodies and reviewer memory. Per-handler migration needs a declarative answer so it's a manifest change rather than a code spelunk.
No typed admission control on the SDK. The llama.cpp addon already serialises concurrent runs internally — exclusiveRunQueue queues them, and a second _runInternal(...) that finds _hasActiveResponse === true throws Error('Cannot set new job: a job is already set or being processed') (RUN_BUSY_ERROR_MESSAGE in packages/llm-llamacpp/index.js). The pre-existing failure surface was an opaque Error deep in the addon. This PR moves that admission rule up to the SDK as a typed RequestRejectedByPolicyError (52420) that fires before the addon is even called, so clients can instanceof-narrow and produce a stable "model busy" message instead of pattern-matching on a free-form string.
Per-request observability was free-form. "What happened on requestId X" required tailing free-form log lines and inferring causality across files. Bare has no APM agent inside the worker, so logs are the only cross-handler observability surface, and they need a fixed shape to be useful.

📝 How does it solve it?

Lands the three framework primitives so subsequent handler-migration PRs against QVAC-18183 have a single, declarative contract to slot into. Framework-only — no handler is migrated onto registry.begin(...) here beyond the completion handler that landed in #2007.

Deliverable 1 — `PluginHandlerDefinition.cancel: { scope, hard? }`

New types PluginHandlerCancel / PluginHandlerCancelScope on schemas/plugin.ts with runtime-schema validation on pluginHandlerDefinitionRuntimeSchema. Plugin manifests that omit cancel still load (it's optional); manifests with cancel.scope outside "request" | "model" | "none" are rejected at registration time.
Declared on every built-in plugin manifest. The truth-table assignment is pinned by test/unit/plugin-cancel-capability.test.ts:

Plugin	Handler	`cancel`
`llamacpp-completion`	`completionStream`, `translate`	`{ scope: "model", hard: true }`
`llamacpp-completion`	`finetune`	`{ scope: "none" }`
`llamacpp-embedding`	`embed`	`{ scope: "model", hard: true }`
`whispercpp-transcription`	`transcribe`, `transcribeStream`	`{ scope: "model", hard: true }`
`parakeet-transcription`	`transcribe`, `transcribeStream`	`{ scope: "model", hard: true }`
`nmtcpp-translation`	`translate`	`{ scope: "none" }`
`onnx-tts`	`textToSpeech`, `textToSpeechStream`	`{ scope: "none" }`
`onnx-ocr`	`ocrStream`	`{ scope: "none" }`
`sdcpp-generation`	`diffusionStream`	`{ scope: "model", hard: true }`
`sdcpp-generation`	`upscaleStream`	`{ scope: "none" }`

scope: "none" is the soft-cancel contract — registry aborts the signal, the handler's stream stops yielding, result is discarded; the C++ work runs to completion in the background. No correctness issue because the result is no longer observed.

Deliverable 2 — `RequestRegistry.policy({ kind, oneAtATimePerModel })`

New policy(opts) method on RequestRegistry registers a ConcurrencyPolicy per RequestKind. The policy check runs before controller / scope allocation in begin(...), so a rejected begin leaves zero registry state behind.
Rejection throws RequestRejectedByPolicyError (code 52420) carrying { requestId, kind, modelId, reason }. Re-exported from @qvac/sdk for instanceof narrowing on the client side.
The worker singleton (request-registry-singleton.ts) installs { kind: "completion", oneAtATimePerModel: true } on first access — promoting the llama.cpp addon's internal _hasActiveResponse guard from an opaque deep-stack Error to a typed framework-level rejection.
The slot is held until await using unwinds, not when cancel(...) fires. A cancelled-but-still-draining request still owns its addon's KV-cache / decode loop, so admission stays held until dispose. Pinned by test/unit/runtime/request-registry.test.ts ("policy: cancel without dispose does NOT release admission") so a future contributor doesn't "fix" the semantic the wrong way.

Other kinds (embeddings, transcribe, …) are intentionally not policied here; subsequent handler-migration PRs on QVAC-18183 will pick admission rules per kind based on their addons' own concurrency models.

Deliverable 3 — Structured `[request-lifecycle]` logging

Three lines per request with a fixed shape:

[request-lifecycle] begin  requestId=<id> kind=<kind> modelId=<id|"-"> state=running
[request-lifecycle] cancel requestId=<id> kind=<kind> modelId=<id|"-"> state=cancelling
[request-lifecycle] end    requestId=<id> kind=<kind> modelId=<id|"-"> state=<terminal> durationMs=<n>

Levels are info everywhere except an end with state=failed, which fires at warn so ops alerting can predicate on level>=warn for this prefix without parsing the message body.
durationMs= (wall-clock from begin to dispose) lands on the end line so log shippers can compute per-request latency without diffing timestamps across two lines.
New withRequestContext(logger, ctx) helper in server/bare/runtime/with-request-context.ts wraps any Logger so every emit ships the same requestId=… kind=… modelId=… prefix. It passes through setLevel / getLevel / addTransport / setConsoleOutput so the wrapped instance is observably indistinguishable from the underlying logger except for the prefix.
withRequestContext(...) is threaded through completion(...) and into createKvCacheSession(modelId, { logger }), so KV-cache turn lifecycle (beginTurn, commitTurn, rollback, prime verification, file delete) inherits the request's correlation tuple. A single grep "requestId=abc-123" returns the full per-request story chronologically.
cancelEntry guards against duplicate cancel emits: a cancel({ requestId }) invoked twice (or once on an already-aborted entry) logs once.

Cursor rules updated in this PR

.cursor/rules/sdk/request-lifecycle-primitives.mdc — cancel-capability declaration table for plugin authors, new "don't reach into addon cancel() directly" anti-pattern, concurrency-policy contract, structured-logging shape (now with durationMs + level-split), error-codes table now carries RequestRejectedByPolicyError (52420).
.cursor/rules/sdk/docs/request-lifecycle-system.mdc — migration-roadmap table updated; three new FAQ entries explain why each primitive was chosen (the rationale that's hard to recover from code alone); implementation files table covers the new modules.
.cursor/rules/sdk/error-handling.mdc — 52420 row added.

🧪 How was it tested?

Pre-merge gates (run from packages/sdk/):

bun run lint        # eslint + tsc, clean
bun run test:unit   # all files, all asserts pass

Unit tests added in this PR:

test/unit/plugin-cancel-capability.test.ts (7 tests): runtime-schema acceptance of the new optional cancel field; rejection of invalid scope values; rejection of non-boolean hard. A bareTest-gated section pins the per-plugin truth table by dynamic-importing every built-in manifest and asserting definition.cancel matches the expected { scope, hard } cell. Bare-only because the manifests transitively load native addons (@qvac/llm-llamacpp etc.) that don't resolve under Bun's resolver.
test/unit/runtime/request-registry.test.ts (24 tests): added 8 policy tests — oneAtATimePerModel rejects a second begin on the same (kind, modelId); scopes admission per (kind, modelId) not globally; ignores requests without modelId; disposing the holder releases admission; cancel without dispose does NOT release admission (pins the actual cancel/dispose semantic); replacing a policy by calling policy(...) again applies the new rule; kinds without a registered policy are unconstrained. Rejection-path assertions verify the typed RequestRejectedByPolicyError shape (requestId, kind, modelId, reason).
test/unit/runtime/request-lifecycle-logging.test.ts (7 tests): stub Logger captures info + warn calls; tests assert the begin/cancel/end line shape including state=running|cancelling|<terminal> + durationMs=<n> segment on end; single-cancel-emit guarantee when cancel({ requestId }) fires twice; cancelAll fan-out emits one cancel line per in-flight entry; cancel-before-begin race-close path emits a begin line with state=cancelling; failed end emits at warn level, not info (ops-alerting predicate).
test/unit/runtime/with-request-context.test.ts (5 tests): prefix shape across every log level (error / warn / info / debug / trace); modelId segment dropped when undefined; extra args after the leading message preserved; zero-argument emits still ship the prefix on its own; setLevel / getLevel / addTransport / setConsoleOutput write through to the underlying logger reference unchanged.

Why no e2e in this PR. Changes here are purely additive at the framework level:

The cancel field on PluginHandlerDefinition is optional and unread by any code path in this PR — subsequent handler-migration PRs on QVAC-18183 will start reading it.
The default oneAtATimePerModel policy for completion does not change observable behaviour for well-behaved single-completion clients — it changes the failure surface for concurrent completions from Error('Cannot set new job: a job is already set or being processed') (raised inside the llama.cpp addon) to a typed RequestRejectedByPolicyError (raised at registry.begin(...)). Clients that already handle the addon's error continue to work; clients that want typed handling get a stable instanceof check.
The structured logs are best-effort observability; they don't affect any control flow.

The existing e2e suite from #2007 (packages/sdk/tests-qvac/tests/cancellation-tests.ts) covers the begin/cancel/end paths and will continue to pass — CI will exercise it on this PR. Handler-migration e2e arrives with the subsequent handler-migration PRs on QVAC-18183.

🔌 API Changes

Three additive surfaces. Existing consumers compile and run unchanged.

1. `PluginHandlerDefinition.cancel` (new optional field)

import { definePlugin, defineHandler } from "@qvac/sdk";

definePlugin({
  manifestVersion: 1,
  handlers: {
    myStream: defineHandler({
      requestSchema,
      responseSchema,
      streaming: true,
      cancel: { scope: "model", hard: true }, // <-- new
      handler: async function* (request, ctx) { /* ... */ },
    }),
  },
});

scope values:

"request" — addon accepts a requestId token; cancel routes per-request.
"model" — addon cancels "the thing currently running on this model" (requires admission control like oneAtATimePerModel to be useful).
"none" — addon has no cancel surface; SDK falls back to soft-cancel (signal aborts, stream stops yielding, C++ runs to completion in background).

hard: true documents that the addon-side cancel interrupts compute. hard: false (or omitted) means cancel is best-effort.

Existing plugin manifests that omit cancel continue to load; the field is optional.

2. `RequestRegistry.policy(...)` (new method) + `RequestRejectedByPolicyError`

Two audiences, two import paths:

Client-side consumers narrow on the error type. RequestRejectedByPolicyError is on the public @qvac/sdk root (packages/sdk/index.ts):

import { RequestRejectedByPolicyError } from "@qvac/sdk";

try {
  for await (const e of sdk.completion({ ... })) { /* ... */ }
} catch (err) {
  if (err instanceof RequestRejectedByPolicyError) {
    showBusy({ modelId: err.modelId, reason: err.reason });
  } else {
    throw err;
  }
}

Handler authors inside the SDK package reach the registry through the worker-internal barrel @/server/bare/runtime (the path @qvac/sdk does not re-export these — they live behind the bare-runtime boundary):

import { getRequestRegistry } from "@/server/bare/runtime";

const registry = getRequestRegistry();
registry.policy({ kind: "embeddings", oneAtATimePerModel: true });

await using ctx = registry.begin({ requestId, kind: "embeddings", modelId });
// ...

The worker singleton auto-installs the completion policy on first access; no caller code change required for that case. Other kinds remain unconstrained until a future PR opts them in.

3. `withRequestContext(logger, ctx)` (new helper)

Worker-internal — only handler authors writing code inside the SDK package use it directly. Imported from the same bare-runtime barrel as the registry:

import { getRequestRegistry, withRequestContext } from "@/server/bare/runtime";
import { getServerLogger } from "@/logging";

const registry = getRequestRegistry();
await using ctx = registry.begin({ requestId, kind: "completion", modelId });
const logger = withRequestContext(getServerLogger(), ctx);

logger.info("starting work"); // -> "[request-lifecycle completion requestId=... modelId=...] starting work"

Handler authors don't need to construct one themselves once handler migrations land on QVAC-18183 (the framework will hand them a pre-wired logger via ctx).

…structured logging Lands the three M3a framework primitives so subsequent handler migration sub-PRs (M3b/M3c) have a single, declarative contract to slot into: 1. `PluginHandlerDefinition.cancel: { scope: "request" | "model" | "none"; hard?: boolean }` - Added to `schemas/plugin.ts` (`PluginHandlerCancel`, `PluginHandlerCancelScope`) + runtime schema validation on `pluginHandlerDefinitionRuntimeSchema`. - Declared on every built-in plugin manifest (llamacpp-completion, llamacpp-embedding, whispercpp/parakeet-transcription, nmtcpp-translation, onnx-tts/ocr, sdcpp-generation). The truth-table assignment is pinned by `test/unit/plugin-cancel-capability.test.ts`. 2. `RequestRegistry.policy({ kind, oneAtATimePerModel })`: - Admission control runs before scope/controller allocation in `begin(...)`. Rejecting a request raises `RequestRejectedByPolicyError` (52420) carrying `requestId`, `kind`, `modelId`, `reason` — re-exported from `@qvac/sdk` for `instanceof` checks. - The worker singleton installs `{ kind: "completion", oneAtATimePerModel: true }` on first access, matching the llama.cpp addon's single-decode-loop reality. 3. Structured `[request-lifecycle]` emits at begin/cancel/end: - Fixed log shape `requestId=<id> kind=<kind> modelId=<id|"-"> state=<state>` so `grep "requestId=abc"` returns the full per-request story chronologically. - `withRequestContext(logger, ctx)` extends the same prefix to handler-level emits; threaded through `completion(...)` and into `KvCacheSession` so KV-cache turn lifecycle shares the request's correlation tuple. - Single-cancel-emit guard suppresses duplicate cancel lines when `cancel({ requestId })` is invoked twice. Verification (from `packages/sdk/`): - `bun run lint` (eslint + tsc): clean. - `bun run test:unit`: 49 files / all asserts pass, including the 4 M3a test files (`plugin-cancel-capability` 7/7, `request-registry` 41/41, `request-lifecycle-logging` 6/6, `with-request-context` 5/5). Cursor rules updated alongside the code: - `request-lifecycle-primitives.mdc`: cancel-capability declaration table, concurrency-policy contract, structured-logging shape, error-codes table now carries 52420. - `docs/request-lifecycle-system.mdc`: migration-roadmap table reflects M3a shipped; three new FAQ entries explain *why* each primitive was chosen; implementation files table covers the new modules. - `error-handling.mdc`: 52420 row added. This PR is framework-only — no handler is migrated onto `registry.begin(...)` here beyond the completion handler that landed in M2. Handler migrations follow in M3b (inference handlers), M3c (non-inference / addon handlers), and M3d (CLI cancel bridge + cancelHandler retirement).

…r-scope-logging

opaninakuffo · 2026-05-14T09:30:16Z

The framework looks solid: policy runs before any controller/scope allocation, RequestRejectedByPolicyError is wired with a dedicated code, lifecycle logs are keyed and failed bumps to warn as advertised, and the unit-test matrix matches the PR narrative.

opaninakuffo · 2026-05-14T09:30:27Z

nit: the PR “API Changes” snippet imports getRequestRegistry / withRequestContext from @qvac/sdk, but packages/sdk/index.ts only adds RequestRejectedByPolicyError on the public root — those two symbols live on the bare runtime module path, not the root export map. Worth retuning the example so copy-paste matches reality (or explicitly exporting them if that was the intent).

…r-scope-logging

simon-iribarren · 2026-05-14T10:46:26Z

/review

github-actions · 2026-05-14T10:46:59Z

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

…r-scope-logging

NamelsKing · 2026-05-14T10:58:09Z

/review

After PR #2036 (M3a) installed the `oneAtATimePerModel` policy on the completion registry, back-to-back `completion({ modelId })` calls hit a small disposal window where the prior request's slot is still alive in `RequestRegistry`. Five `logging-*` e2e tests started failing in ~100 ms with `RequestRejectedByPolicyError` (52420), each immediately after `logging-persist-across-reload`. Test-side mitigation only: - `callWhenAddonIdle` retries on both the legacy llama.cpp busy throw and the new typed policy rejection. - `runReload` joins its own triggering completion before returning so `collectLogs`'s `Promise.race` cannot hand the next test a still-running slot. The underlying SDK race (client `final` resolves before server `await using ctx` disposal) needs a separate SDK-side fix; see QVAC-18935. Refs: QVAC-18935

…gging tests (#2065) * test[skiplog]: retry on RequestRejectedByPolicyError in logging tests After PR #2036 (M3a) installed the `oneAtATimePerModel` policy on the completion registry, back-to-back `completion({ modelId })` calls hit a small disposal window where the prior request's slot is still alive in `RequestRegistry`. Five `logging-*` e2e tests started failing in ~100 ms with `RequestRejectedByPolicyError` (52420), each immediately after `logging-persist-across-reload`. Test-side mitigation only: - `callWhenAddonIdle` retries on both the legacy llama.cpp busy throw and the new typed policy rejection. - `runReload` joins its own triggering completion before returning so `collectLogs`'s `Promise.race` cannot hand the next test a still-running slot. The underlying SDK race (client `final` resolves before server `await using ctx` disposal) needs a separate SDK-side fix; see QVAC-18935. Refs: QVAC-18935 * fixup: match RequestRejectedByPolicyError by code, not instanceof Server errors cross the RPC boundary as `RPCError`, so `instanceof RequestRejectedByPolicyError` is always false on the client. CI on PR #2065 confirmed this: `addon-logging-during-inference` failed in 204 ms (< 250 ms retry interval), meaning the first attempt's error was not recognised as transient. Match by `code === SDK_SERVER_ERROR_CODES.REQUEST_REJECTED_BY_POLICY` (52420) — the documented client-side check pattern. Refs: QVAC-18935

simon-iribarren requested review from a team as code owners May 13, 2026 19:59

simon-iribarren added verify tier1 labels May 13, 2026

simon-iribarren requested a deployment to release May 13, 2026 20:00 — with GitHub Actions Waiting

simon-iribarren force-pushed the feat/qvac-18183-cancel-capability-per-handler-scope-logging branch 2 times, most recently from 42e7df4 to 80d15a2 Compare May 14, 2026 08:16

simon-iribarren force-pushed the feat/qvac-18183-cancel-capability-per-handler-scope-logging branch from 80d15a2 to 31605bd Compare May 14, 2026 08:21

Merge branch 'main' into feat/qvac-18183-cancel-capability-per-handle…

aba60b1

…r-scope-logging

simon-iribarren added test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] and removed test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] labels May 14, 2026

Merge branch 'main' into feat/qvac-18183-cancel-capability-per-handle…

095e33d

…r-scope-logging

opaninakuffo approved these changes May 14, 2026

View reviewed changes

NamelsKing approved these changes May 14, 2026

View reviewed changes

Merge branch 'main' into feat/qvac-18183-cancel-capability-per-handle…

cf88a6f

…r-scope-logging

simon-iribarren added the verified Authorize secrets / label-gate in PR workflows label May 14, 2026

simon-iribarren temporarily deployed to release May 14, 2026 10:42 — with GitHub Actions Inactive

simon-iribarren added tier1 and removed tier1 labels May 14, 2026

simon-iribarren requested a deployment to release May 14, 2026 10:45 — with GitHub Actions Waiting

tetherto deleted a comment from github-actions Bot May 14, 2026

Merge branch 'main' into feat/qvac-18183-cancel-capability-per-handle…

7041d53

…r-scope-logging

simon-iribarren temporarily deployed to release May 14, 2026 10:47 — with GitHub Actions Inactive

simon-iribarren removed test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] verified Authorize secrets / label-gate in PR workflows labels May 14, 2026

NamelsKing added the verified Authorize secrets / label-gate in PR workflows label May 14, 2026

NamelsKing temporarily deployed to release May 14, 2026 10:58 — with GitHub Actions Inactive

NamelsKing merged commit 9cc6018 into tetherto:main May 14, 2026
31 checks passed

simon-iribarren mentioned this pull request May 14, 2026

QVAC-18183 feat[api]: inference-handler migrations #2058

Open

Victor-Rodzko mentioned this pull request May 14, 2026

QVAC-18935 test[skiplog]: retry on RequestRejectedByPolicyError in logging tests #2065

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QVAC-18183 feat[api]: cancel capability + per-handler cancel scope + structured logging#2036

QVAC-18183 feat[api]: cancel capability + per-handler cancel scope + structured logging#2036
NamelsKing merged 5 commits into
tetherto:mainfrom
simon-iribarren:feat/qvac-18183-cancel-capability-per-handler-scope-logging

simon-iribarren commented May 13, 2026 •

edited

Loading

Uh oh!

opaninakuffo commented May 14, 2026

Uh oh!

opaninakuffo commented May 14, 2026

Uh oh!

simon-iribarren commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

NamelsKing commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

simon-iribarren commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 What problem does this PR solve?

📝 How does it solve it?

Deliverable 1 — PluginHandlerDefinition.cancel: { scope, hard? }

Deliverable 2 — RequestRegistry.policy({ kind, oneAtATimePerModel })

Deliverable 3 — Structured [request-lifecycle] logging

Cursor rules updated in this PR

🧪 How was it tested?

🔌 API Changes

1. PluginHandlerDefinition.cancel (new optional field)

2. RequestRegistry.policy(...) (new method) + RequestRejectedByPolicyError

3. withRequestContext(logger, ctx) (new helper)

Uh oh!

opaninakuffo commented May 14, 2026

Uh oh!

opaninakuffo commented May 14, 2026

Uh oh!

simon-iribarren commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tier-based Approval Status

Uh oh!

NamelsKing commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

simon-iribarren commented May 13, 2026 •

edited

Loading

Deliverable 1 — `PluginHandlerDefinition.cancel: { scope, hard? }`

Deliverable 2 — `RequestRegistry.policy({ kind, oneAtATimePerModel })`

Deliverable 3 — Structured `[request-lifecycle]` logging

1. `PluginHandlerDefinition.cancel` (new optional field)

2. `RequestRegistry.policy(...)` (new method) + `RequestRejectedByPolicyError`

3. `withRequestContext(logger, ctx)` (new helper)

github-actions Bot commented May 14, 2026 •

edited

Loading