QVAC-18183 feat[api]: cancel capability + per-handler cancel scope + structured logging#2036
Merged
NamelsKing merged 5 commits intoMay 14, 2026
Conversation
42e7df4 to
80d15a2
Compare
…structured logging
Lands the three M3a framework primitives so subsequent handler migration
sub-PRs (M3b/M3c) have a single, declarative contract to slot into:
1. `PluginHandlerDefinition.cancel: { scope: "request" | "model" | "none"; hard?: boolean }`
- Added to `schemas/plugin.ts` (`PluginHandlerCancel`, `PluginHandlerCancelScope`)
+ runtime schema validation on `pluginHandlerDefinitionRuntimeSchema`.
- Declared on every built-in plugin manifest (llamacpp-completion,
llamacpp-embedding, whispercpp/parakeet-transcription, nmtcpp-translation,
onnx-tts/ocr, sdcpp-generation). The truth-table assignment is pinned by
`test/unit/plugin-cancel-capability.test.ts`.
2. `RequestRegistry.policy({ kind, oneAtATimePerModel })`:
- Admission control runs before scope/controller allocation in `begin(...)`.
Rejecting a request raises `RequestRejectedByPolicyError` (52420) carrying
`requestId`, `kind`, `modelId`, `reason` — re-exported from `@qvac/sdk` for
`instanceof` checks.
- The worker singleton installs `{ kind: "completion", oneAtATimePerModel: true }`
on first access, matching the llama.cpp addon's single-decode-loop reality.
3. Structured `[request-lifecycle]` emits at begin/cancel/end:
- Fixed log shape `requestId=<id> kind=<kind> modelId=<id|"-"> state=<state>` so
`grep "requestId=abc"` returns the full per-request story chronologically.
- `withRequestContext(logger, ctx)` extends the same prefix to handler-level
emits; threaded through `completion(...)` and into `KvCacheSession` so
KV-cache turn lifecycle shares the request's correlation tuple.
- Single-cancel-emit guard suppresses duplicate cancel lines when
`cancel({ requestId })` is invoked twice.
Verification (from `packages/sdk/`):
- `bun run lint` (eslint + tsc): clean.
- `bun run test:unit`: 49 files / all asserts pass, including the 4 M3a test
files (`plugin-cancel-capability` 7/7, `request-registry` 41/41,
`request-lifecycle-logging` 6/6, `with-request-context` 5/5).
Cursor rules updated alongside the code:
- `request-lifecycle-primitives.mdc`: cancel-capability declaration table,
concurrency-policy contract, structured-logging shape, error-codes table now
carries 52420.
- `docs/request-lifecycle-system.mdc`: migration-roadmap table reflects M3a
shipped; three new FAQ entries explain *why* each primitive was chosen;
implementation files table covers the new modules.
- `error-handling.mdc`: 52420 row added.
This PR is framework-only — no handler is migrated onto `registry.begin(...)` here
beyond the completion handler that landed in M2. Handler migrations follow in
M3b (inference handlers), M3c (non-inference / addon handlers), and M3d (CLI
cancel bridge + cancelHandler retirement).
80d15a2 to
31605bd
Compare
Contributor
|
The framework looks solid: policy runs before any controller/scope allocation, |
Contributor
|
nit: the PR “API Changes” snippet imports |
opaninakuffo
approved these changes
May 14, 2026
NamelsKing
approved these changes
May 14, 2026
Contributor
Author
|
/review |
Tier-based Approval Status |
Contributor
|
/review |
Victor-Rodzko
added a commit
that referenced
this pull request
May 14, 2026
After PR #2036 (M3a) installed the `oneAtATimePerModel` policy on the completion registry, back-to-back `completion({ modelId })` calls hit a small disposal window where the prior request's slot is still alive in `RequestRegistry`. Five `logging-*` e2e tests started failing in ~100 ms with `RequestRejectedByPolicyError` (52420), each immediately after `logging-persist-across-reload`. Test-side mitigation only: - `callWhenAddonIdle` retries on both the legacy llama.cpp busy throw and the new typed policy rejection. - `runReload` joins its own triggering completion before returning so `collectLogs`'s `Promise.race` cannot hand the next test a still-running slot. The underlying SDK race (client `final` resolves before server `await using ctx` disposal) needs a separate SDK-side fix; see QVAC-18935. Refs: QVAC-18935
Victor-Rodzko
added a commit
that referenced
this pull request
May 14, 2026
…gging tests (#2065) * test[skiplog]: retry on RequestRejectedByPolicyError in logging tests After PR #2036 (M3a) installed the `oneAtATimePerModel` policy on the completion registry, back-to-back `completion({ modelId })` calls hit a small disposal window where the prior request's slot is still alive in `RequestRegistry`. Five `logging-*` e2e tests started failing in ~100 ms with `RequestRejectedByPolicyError` (52420), each immediately after `logging-persist-across-reload`. Test-side mitigation only: - `callWhenAddonIdle` retries on both the legacy llama.cpp busy throw and the new typed policy rejection. - `runReload` joins its own triggering completion before returning so `collectLogs`'s `Promise.race` cannot hand the next test a still-running slot. The underlying SDK race (client `final` resolves before server `await using ctx` disposal) needs a separate SDK-side fix; see QVAC-18935. Refs: QVAC-18935 * fixup: match RequestRejectedByPolicyError by code, not instanceof Server errors cross the RPC boundary as `RPCError`, so `instanceof RequestRejectedByPolicyError` is always false on the client. CI on PR #2065 confirmed this: `addon-logging-during-inference` failed in 204 ms (< 250 ms retry interval), meaning the first attempt's error was not recognised as transient. Match by `code === SDK_SERVER_ERROR_CODES.REQUEST_REJECTED_BY_POLICY` (52420) — the documented client-side check pattern. Refs: QVAC-18935
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 What problem does this PR solve?
RequestRegistryworked after QVAC-18182 feat[api]: typed cancel outcomes on the wire + atomic KV-cache via KvCacheSession #2007, but the addon-side truth — does this handler accept arequestId, does it cancel at the model level, or is it un-cancellable? — lived in handler bodies and reviewer memory. Per-handler migration needs a declarative answer so it's a manifest change rather than a code spelunk.exclusiveRunQueuequeues them, and a second_runInternal(...)that finds_hasActiveResponse === truethrowsError('Cannot set new job: a job is already set or being processed')(RUN_BUSY_ERROR_MESSAGEinpackages/llm-llamacpp/index.js). The pre-existing failure surface was an opaqueErrordeep in the addon. This PR moves that admission rule up to the SDK as a typedRequestRejectedByPolicyError(52420) that fires before the addon is even called, so clients caninstanceof-narrow and produce a stable "model busy" message instead of pattern-matching on a free-form string.📝 How does it solve it?
Lands the three framework primitives so subsequent handler-migration PRs against QVAC-18183 have a single, declarative contract to slot into. Framework-only — no handler is migrated onto
registry.begin(...)here beyond thecompletionhandler that landed in #2007.Deliverable 1 —
PluginHandlerDefinition.cancel: { scope, hard? }PluginHandlerCancel/PluginHandlerCancelScopeonschemas/plugin.tswith runtime-schema validation onpluginHandlerDefinitionRuntimeSchema. Plugin manifests that omitcancelstill load (it's optional); manifests withcancel.scopeoutside"request" | "model" | "none"are rejected at registration time.test/unit/plugin-cancel-capability.test.ts:cancelllamacpp-completioncompletionStream,translate{ scope: "model", hard: true }llamacpp-completionfinetune{ scope: "none" }llamacpp-embeddingembed{ scope: "model", hard: true }whispercpp-transcriptiontranscribe,transcribeStream{ scope: "model", hard: true }parakeet-transcriptiontranscribe,transcribeStream{ scope: "model", hard: true }nmtcpp-translationtranslate{ scope: "none" }onnx-ttstextToSpeech,textToSpeechStream{ scope: "none" }onnx-ocrocrStream{ scope: "none" }sdcpp-generationdiffusionStream{ scope: "model", hard: true }sdcpp-generationupscaleStream{ scope: "none" }scope: "none"is the soft-cancel contract — registry aborts the signal, the handler's stream stops yielding, result is discarded; the C++ work runs to completion in the background. No correctness issue because the result is no longer observed.Deliverable 2 —
RequestRegistry.policy({ kind, oneAtATimePerModel })policy(opts)method onRequestRegistryregisters aConcurrencyPolicyperRequestKind. The policy check runs before controller / scope allocation inbegin(...), so a rejectedbeginleaves zero registry state behind.RequestRejectedByPolicyError(code 52420) carrying{ requestId, kind, modelId, reason }. Re-exported from@qvac/sdkforinstanceofnarrowing on the client side.request-registry-singleton.ts) installs{ kind: "completion", oneAtATimePerModel: true }on first access — promoting the llama.cpp addon's internal_hasActiveResponseguard from an opaque deep-stackErrorto a typed framework-level rejection.await usingunwinds, not whencancel(...)fires. A cancelled-but-still-draining request still owns its addon's KV-cache / decode loop, so admission stays held until dispose. Pinned bytest/unit/runtime/request-registry.test.ts("policy: cancel without dispose does NOT release admission") so a future contributor doesn't "fix" the semantic the wrong way.Other kinds (
embeddings,transcribe, …) are intentionally not policied here; subsequent handler-migration PRs on QVAC-18183 will pick admission rules per kind based on their addons' own concurrency models.Deliverable 3 — Structured
[request-lifecycle]loggingThree lines per request with a fixed shape:
infoeverywhere except anendwithstate=failed, which fires atwarnso ops alerting can predicate onlevel>=warnfor this prefix without parsing the message body.durationMs=(wall-clock from begin to dispose) lands on theendline so log shippers can compute per-request latency without diffing timestamps across two lines.withRequestContext(logger, ctx)helper inserver/bare/runtime/with-request-context.tswraps anyLoggerso every emit ships the samerequestId=… kind=… modelId=…prefix. It passes throughsetLevel/getLevel/addTransport/setConsoleOutputso the wrapped instance is observably indistinguishable from the underlying logger except for the prefix.withRequestContext(...)is threaded throughcompletion(...)and intocreateKvCacheSession(modelId, { logger }), so KV-cache turn lifecycle (beginTurn,commitTurn,rollback, prime verification, file delete) inherits the request's correlation tuple. A singlegrep "requestId=abc-123"returns the full per-request story chronologically.cancelEntryguards against duplicate cancel emits: acancel({ requestId })invoked twice (or once on an already-aborted entry) logs once.Cursor rules updated in this PR
.cursor/rules/sdk/request-lifecycle-primitives.mdc— cancel-capability declaration table for plugin authors, new "don't reach into addoncancel()directly" anti-pattern, concurrency-policy contract, structured-logging shape (now withdurationMs+ level-split), error-codes table now carriesRequestRejectedByPolicyError(52420)..cursor/rules/sdk/docs/request-lifecycle-system.mdc— migration-roadmap table updated; three new FAQ entries explain why each primitive was chosen (the rationale that's hard to recover from code alone); implementation files table covers the new modules..cursor/rules/sdk/error-handling.mdc— 52420 row added.🧪 How was it tested?
Pre-merge gates (run from
packages/sdk/):Unit tests added in this PR:
test/unit/plugin-cancel-capability.test.ts(7 tests): runtime-schema acceptance of the new optionalcancelfield; rejection of invalidscopevalues; rejection of non-booleanhard. AbareTest-gated section pins the per-plugin truth table by dynamic-importing every built-in manifest and assertingdefinition.cancelmatches the expected{ scope, hard }cell. Bare-only because the manifests transitively load native addons (@qvac/llm-llamacppetc.) that don't resolve under Bun's resolver.test/unit/runtime/request-registry.test.ts(24 tests): added 8 policy tests —oneAtATimePerModelrejects a secondbeginon the same(kind, modelId); scopes admission per(kind, modelId)not globally; ignores requests withoutmodelId; disposing the holder releases admission;cancelwithout dispose does NOT release admission (pins the actual cancel/dispose semantic); replacing a policy by callingpolicy(...)again applies the new rule; kinds without a registered policy are unconstrained. Rejection-path assertions verify the typedRequestRejectedByPolicyErrorshape (requestId,kind,modelId,reason).test/unit/runtime/request-lifecycle-logging.test.ts(7 tests): stubLoggercapturesinfo+warncalls; tests assert the begin/cancel/end line shape includingstate=running|cancelling|<terminal>+durationMs=<n>segment on end; single-cancel-emit guarantee whencancel({ requestId })fires twice;cancelAllfan-out emits onecancelline per in-flight entry;cancel-before-beginrace-close path emits abeginline withstate=cancelling;failedend emits atwarnlevel, notinfo(ops-alerting predicate).test/unit/runtime/with-request-context.test.ts(5 tests): prefix shape across every log level (error/warn/info/debug/trace);modelIdsegment dropped whenundefined; extra args after the leading message preserved; zero-argument emits still ship the prefix on its own;setLevel/getLevel/addTransport/setConsoleOutputwrite through to the underlying logger reference unchanged.Why no e2e in this PR. Changes here are purely additive at the framework level:
cancelfield onPluginHandlerDefinitionis optional and unread by any code path in this PR — subsequent handler-migration PRs on QVAC-18183 will start reading it.oneAtATimePerModelpolicy forcompletiondoes not change observable behaviour for well-behaved single-completion clients — it changes the failure surface for concurrent completions fromError('Cannot set new job: a job is already set or being processed')(raised inside the llama.cpp addon) to a typedRequestRejectedByPolicyError(raised atregistry.begin(...)). Clients that already handle the addon's error continue to work; clients that want typed handling get a stableinstanceofcheck.The existing e2e suite from #2007 (
packages/sdk/tests-qvac/tests/cancellation-tests.ts) covers the begin/cancel/end paths and will continue to pass — CI will exercise it on this PR. Handler-migration e2e arrives with the subsequent handler-migration PRs on QVAC-18183.🔌 API Changes
Three additive surfaces. Existing consumers compile and run unchanged.
1.
PluginHandlerDefinition.cancel(new optional field)scopevalues:"request"— addon accepts arequestIdtoken; cancel routes per-request."model"— addon cancels "the thing currently running on this model" (requires admission control likeoneAtATimePerModelto be useful)."none"— addon has no cancel surface; SDK falls back to soft-cancel (signal aborts, stream stops yielding, C++ runs to completion in background).hard: truedocuments that the addon-side cancel interrupts compute.hard: false(or omitted) means cancel is best-effort.Existing plugin manifests that omit
cancelcontinue to load; the field is optional.2.
RequestRegistry.policy(...)(new method) +RequestRejectedByPolicyErrorTwo audiences, two import paths:
RequestRejectedByPolicyErroris on the public@qvac/sdkroot (packages/sdk/index.ts):@/server/bare/runtime(the path@qvac/sdkdoes not re-export these — they live behind the bare-runtime boundary):The worker singleton auto-installs the
completionpolicy on first access; no caller code change required for that case. Other kinds remain unconstrained until a future PR opts them in.3.
withRequestContext(logger, ctx)(new helper)Worker-internal — only handler authors writing code inside the SDK package use it directly. Imported from the same bare-runtime barrel as the registry:
Handler authors don't need to construct one themselves once handler migrations land on QVAC-18183 (the framework will hand them a pre-wired logger via
ctx).