QVAC-18935 test[skiplog]: retry on RequestRejectedByPolicyError in logging tests by Victor-Rodzko · Pull Request #2065 · tetherto/qvac

Victor-Rodzko · 2026-05-14T13:37:11Z

🎯 What problem does this PR solve?

5 logging-* e2e tests regressed deterministically on Linux / Windows / iOS after QVAC-18183 feat[api]: cancel capability + per-handler cancel scope + structured logging #2036 (M3a) installed the oneAtATimePerModel registry policy: logging-all-addons-silent, logging-long-message, logging-streaming-stress, logging-timestamp-accuracy, logging-namespace-filter.
All five failed in ~100 ms with RequestRejectedByPolicyError (52420), citing the same "zombie" requestId from the just-finished logging-persist-across-reload completion.
Repro: run 25859103224; same five failures reproduce 5/5 locally on main at the same SHA.

📝 How does it solve it?

Test-side mitigation only — the underlying SDK race (client final resolves before server await using ctx disposal) needs a separate SDK-side fix, tracked in QVAC-18935.
callWhenAddonIdle now treats RequestRejectedByPolicyError as the same transient condition as the legacy llama.cpp busy throw. Both mean "slot still occupied, retry shortly" — same 30 s deadline, same 250 ms backoff.
runReload joins its own triggering completion before returning, so collectLogs's Promise.race cannot hand the next test a still-running request.

🧪 How was it tested?

Local repro: 5/5 failures on main without the fix, same RequestRejectedByPolicyError payload as CI run 25859103224.
With fix: npm run typecheck clean in packages/sdk/tests-qvac; on this branch the 5 tests retry through the disposal window and pass.
Defense-in-depth: timeout-bounded retry surfaces as AddonBusyTimeoutError (already wired to the dep-eviction cascade) if the slot truly never frees, so a real SDK regression stays loud rather than silently retrying forever.

Refs: QVAC-18935

After PR #2036 (M3a) installed the `oneAtATimePerModel` policy on the completion registry, back-to-back `completion({ modelId })` calls hit a small disposal window where the prior request's slot is still alive in `RequestRegistry`. Five `logging-*` e2e tests started failing in ~100 ms with `RequestRejectedByPolicyError` (52420), each immediately after `logging-persist-across-reload`. Test-side mitigation only: - `callWhenAddonIdle` retries on both the legacy llama.cpp busy throw and the new typed policy rejection. - `runReload` joins its own triggering completion before returning so `collectLogs`'s `Promise.race` cannot hand the next test a still-running slot. The underlying SDK race (client `final` resolves before server `await using ctx` disposal) needs a separate SDK-side fix; see QVAC-18935. Refs: QVAC-18935

github-actions · 2026-05-14T14:12:02Z

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

Server errors cross the RPC boundary as `RPCError`, so `instanceof RequestRejectedByPolicyError` is always false on the client. CI on PR #2065 confirmed this: `addon-logging-during-inference` failed in 204 ms (< 250 ms retry interval), meaning the first attempt's error was not recognised as transient. Match by `code === SDK_SERVER_ERROR_CODES.REQUEST_REJECTED_BY_POLICY` (52420) — the documented client-side check pattern. Refs: QVAC-18935

simon-iribarren · 2026-05-14T14:26:05Z

Non-blocking, approve stands — but since you mentioned still seeing some Linux flake, two thoughts that might help:

1. The runReload race shape lives in three other handlers too. collectLogs does Promise.race([collectPromise, triggerPromise.then(() => sleep(postTriggerWaitMs))]), so the collect-vs-trigger race that motivated this fix also exists in runAddonLogging, runDuringInference, and runConcurrent — all three pass a trigger that wraps callWhenAddonIdle/runCompletion, and the collector can win the race while the server-side await using ctx is still disposing. runReload is the only call site exposed deterministically (because it unloads/reloads then immediately hands a fresh model to the next test), but if Linux is intermittently flaking on the other four tests too, those three are the next suspects.

Rather than open-code the let triggerPromise = Promise.resolve(); … await triggerPromise.catch(() => {}) join four times, you could lift the join into collectLogs itself so every caller gets it for free:

async function collectLogs(opts: CollectLogsOptions): Promise<CollectLogsResult> {
  // ... existing setup ...
  let triggerPromise: Promise<void> = Promise.resolve();
  const triggerWrapper = (async () => {
    await sleep(STREAM_OPEN_DELAY_MS + preTriggerExtraWaitMs);
    if (trigger) {
      triggerStartMs = Date.now();
      triggerPromise = trigger();
      await triggerPromise;
    }
  })();

  await Promise.race([collectPromise, triggerWrapper.then(() => sleep(postTriggerWaitMs))]);
  await triggerPromise.catch(() => {});

  return { logs, passed: logs.length > 0, output: /* … */ };
}

Closes the race for all four handlers, and runReload drops back to its pre-PR shape.

2. Tiny readability nit. The await triggerPromise.catch(() => {}) swallow is intentional (result is already decided by the race; callWhenAddonIdle already converts true never-free conditions into AddonBusyTimeoutError), but the SDK convention is "no .catch(() => {})". A one-line comment above it (e.g. // drain: result already decided by race; AddonBusyTimeoutError still surfaces real regressions) would head off future readers. If you fold (1), this goes away naturally.

Either way the fix is correctly scoped and the approval stands.

github-actions · 2026-05-14T14:33:45Z

QVAC E2E — `android` — ⚠️ no results

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: Device Farm logs

The test job did not produce a results artifact. Check the run for job-level failures.

github-actions · 2026-05-14T14:33:45Z

QVAC E2E — `ios` — ⚠️ no results

Config: suite=smoke · filter=(none) · exclude=(none)
View run

The test job did not produce a results artifact. Check the run for job-level failures.

github-actions · 2026-05-14T14:33:45Z

QVAC E2E — `windows` — ✅ all tests passed (91/91, 403s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports

github-actions · 2026-05-14T14:33:47Z

QVAC E2E — `linux` — ✅ all tests passed (91/91, 343s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports

github-actions · 2026-05-14T14:33:49Z

QVAC E2E — `macos` — ⚠️ no results

Config: suite=smoke · filter=(none) · exclude=(none)
View run

The test job did not produce a results artifact. Check the run for job-level failures.

…ejection

kinsta · 2026-05-14T15:34:47Z

Preview deployments for qvac-docs-staging ⚡️

Status	Branch preview	Commit preview
🔁 Deploying...	N/A	N/A

Commit: a62b507c72b0f4db9c68b50b56504f39777424b8

Deployment ID: 599ed843-3213-4e89-b3fe-6a40f55b0157

Static site name: qvac-docs-staging-fazwv

Victor-Rodzko requested review from a team as code owners May 14, 2026 13:37

Victor-Rodzko force-pushed the test/QVAC-18935-logging-flake-retry-policy-rejection branch from 92ae29e to 07b7076 Compare May 14, 2026 13:43

Victor-Rodzko added verify tier1 test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] verified Authorize secrets / label-gate in PR workflows labels May 14, 2026

Victor-Rodzko temporarily deployed to release May 14, 2026 13:45 — with GitHub Actions Inactive

This comment has been minimized.

Sign in to view

Victor-Rodzko had a problem deploying to release May 14, 2026 13:56 — with GitHub Actions Failure

Victor-Rodzko temporarily deployed to release May 14, 2026 13:56 — with GitHub Actions Inactive

Victor-Rodzko had a problem deploying to release May 14, 2026 13:57 — with GitHub Actions Failure

Victor-Rodzko temporarily deployed to release May 14, 2026 13:57 — with GitHub Actions Inactive

simon-iribarren previously approved these changes May 14, 2026

View reviewed changes

NamelsKing previously approved these changes May 14, 2026

View reviewed changes

Victor-Rodzko dismissed stale reviews from NamelsKing and simon-iribarren via a0334b6 May 14, 2026 14:19

Victor-Rodzko temporarily deployed to release May 14, 2026 14:19 — with GitHub Actions Inactive

Victor-Rodzko added test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] and removed test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] labels May 14, 2026

Victor-Rodzko temporarily deployed to release May 14, 2026 14:33 — with GitHub Actions Inactive

Victor-Rodzko had a problem deploying to release May 14, 2026 14:33 — with GitHub Actions Error

Victor-Rodzko temporarily deployed to release May 14, 2026 14:33 — with GitHub Actions Inactive

Victor-Rodzko temporarily deployed to release May 14, 2026 14:38 — with GitHub Actions Inactive

Victor-Rodzko had a problem deploying to release May 14, 2026 14:38 — with GitHub Actions Failure

Victor-Rodzko temporarily deployed to release May 14, 2026 14:45 — with GitHub Actions Inactive

Victor-Rodzko had a problem deploying to release May 14, 2026 14:45 — with GitHub Actions Failure

Victor-Rodzko temporarily deployed to release May 14, 2026 15:00 — with GitHub Actions Inactive

Victor-Rodzko temporarily deployed to release May 14, 2026 15:07 — with GitHub Actions Inactive

NamelsKing approved these changes May 14, 2026

View reviewed changes

lauripiisang approved these changes May 14, 2026

View reviewed changes

Merge branch 'main' into test/QVAC-18935-logging-flake-retry-policy-r…

a62b507

…ejection

kinsta Bot deployed to preview May 14, 2026 15:34 View deployment

Victor-Rodzko temporarily deployed to release May 14, 2026 15:35 — with GitHub Actions Inactive

Victor-Rodzko merged commit 8ca7bd3 into main May 14, 2026
25 checks passed

Victor-Rodzko deleted the test/QVAC-18935-logging-flake-retry-policy-rejection branch May 14, 2026 15:39

Victor-Rodzko temporarily deployed to release May 14, 2026 15:39 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QVAC-18935 test[skiplog]: retry on RequestRejectedByPolicyError in logging tests#2065

QVAC-18935 test[skiplog]: retry on RequestRejectedByPolicyError in logging tests#2065
Victor-Rodzko merged 3 commits into
mainfrom
test/QVAC-18935-logging-flake-retry-policy-rejection

Victor-Rodzko commented May 14, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

simon-iribarren commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

kinsta Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Victor-Rodzko commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 What problem does this PR solve?

📝 How does it solve it?

🧪 How was it tested?

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tier-based Approval Status

Uh oh!

simon-iribarren commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QVAC E2E — android — ⚠️ no results

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QVAC E2E — ios — ⚠️ no results

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QVAC E2E — windows — ✅ all tests passed (91/91, 403s)

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QVAC E2E — linux — ✅ all tests passed (91/91, 343s)

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QVAC E2E — macos — ⚠️ no results

Uh oh!

kinsta Bot commented May 14, 2026

Preview deployments for qvac-docs-staging ⚡️

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Victor-Rodzko commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading

QVAC E2E — `android` — ⚠️ no results

github-actions Bot commented May 14, 2026 •

edited

Loading

QVAC E2E — `ios` — ⚠️ no results

github-actions Bot commented May 14, 2026 •

edited

Loading

QVAC E2E — `windows` — ✅ all tests passed (91/91, 403s)

github-actions Bot commented May 14, 2026 •

edited

Loading

QVAC E2E — `linux` — ✅ all tests passed (91/91, 343s)

github-actions Bot commented May 14, 2026 •

edited

Loading

QVAC E2E — `macos` — ⚠️ no results