QVAC-18935 test[skiplog]: retry on RequestRejectedByPolicyError in logging tests#2065
Conversation
After PR #2036 (M3a) installed the `oneAtATimePerModel` policy on the completion registry, back-to-back `completion({ modelId })` calls hit a small disposal window where the prior request's slot is still alive in `RequestRegistry`. Five `logging-*` e2e tests started failing in ~100 ms with `RequestRejectedByPolicyError` (52420), each immediately after `logging-persist-across-reload`. Test-side mitigation only: - `callWhenAddonIdle` retries on both the legacy llama.cpp busy throw and the new typed policy rejection. - `runReload` joins its own triggering completion before returning so `collectLogs`'s `Promise.race` cannot hand the next test a still-running slot. The underlying SDK race (client `final` resolves before server `await using ctx` disposal) needs a separate SDK-side fix; see QVAC-18935. Refs: QVAC-18935
92ae29e to
07b7076
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Tier-based Approval Status |
Server errors cross the RPC boundary as `RPCError`, so `instanceof RequestRejectedByPolicyError` is always false on the client. CI on PR #2065 confirmed this: `addon-logging-during-inference` failed in 204 ms (< 250 ms retry interval), meaning the first attempt's error was not recognised as transient. Match by `code === SDK_SERVER_ERROR_CODES.REQUEST_REJECTED_BY_POLICY` (52420) — the documented client-side check pattern. Refs: QVAC-18935
a0334b6
|
Non-blocking, approve stands — but since you mentioned still seeing some Linux flake, two thoughts that might help: 1. The Rather than open-code the async function collectLogs(opts: CollectLogsOptions): Promise<CollectLogsResult> {
// ... existing setup ...
let triggerPromise: Promise<void> = Promise.resolve();
const triggerWrapper = (async () => {
await sleep(STREAM_OPEN_DELAY_MS + preTriggerExtraWaitMs);
if (trigger) {
triggerStartMs = Date.now();
triggerPromise = trigger();
await triggerPromise;
}
})();
await Promise.race([collectPromise, triggerWrapper.then(() => sleep(postTriggerWaitMs))]);
await triggerPromise.catch(() => {});
return { logs, passed: logs.length > 0, output: /* … */ };
}Closes the race for all four handlers, and 2. Tiny readability nit. The Either way the fix is correctly scoped and the approval stands. |
QVAC E2E —
|
QVAC E2E —
|
QVAC E2E —
|
Preview deployments for qvac-docs-staging ⚡️
Commit: Deployment ID: Static site name: |
🎯 What problem does this PR solve?
logging-*e2e tests regressed deterministically on Linux / Windows / iOS after QVAC-18183 feat[api]: cancel capability + per-handler cancel scope + structured logging #2036 (M3a) installed theoneAtATimePerModelregistry policy:logging-all-addons-silent,logging-long-message,logging-streaming-stress,logging-timestamp-accuracy,logging-namespace-filter.RequestRejectedByPolicyError(52420), citing the same "zombie"requestIdfrom the just-finishedlogging-persist-across-reloadcompletion.mainat the same SHA.📝 How does it solve it?
finalresolves before serverawait using ctxdisposal) needs a separate SDK-side fix, tracked in QVAC-18935.callWhenAddonIdlenow treatsRequestRejectedByPolicyErroras the same transient condition as the legacy llama.cpp busy throw. Both mean "slot still occupied, retry shortly" — same 30 s deadline, same 250 ms backoff.runReloadjoins its own triggering completion before returning, socollectLogs'sPromise.racecannot hand the next test a still-running request.🧪 How was it tested?
mainwithout the fix, sameRequestRejectedByPolicyErrorpayload as CI run 25859103224.npm run typecheckclean inpackages/sdk/tests-qvac; on this branch the 5 tests retry through the disposal window and pass.AddonBusyTimeoutError(already wired to the dep-eviction cascade) if the slot truly never frees, so a real SDK regression stays loud rather than silently retrying forever.Refs: QVAC-18935