Skip to content

fix: eliminate flaky 404s in link/QR live tests and fix qrCodeBytes type#143

Merged
jaredwray merged 1 commit into
mainfrom
claude/vigilant-cori-y87lpe
Jun 12, 2026
Merged

fix: eliminate flaky 404s in link/QR live tests and fix qrCodeBytes type#143
jaredwray merged 1 commit into
mainfrom
claude/vigilant-cori-y87lpe

Conversation

@jaredwray

Copy link
Copy Markdown
Contributor

Summary

pnpm test intermittently fails with Error: Fetch failed with status 404 in the link/QR integration tests. This PR audits the QR code area, fixes the root causes of the flakes, and fixes a real QR bug the audit turned up.

Audit findings

1. The stats test raced concurrent CI runs (primary 404 source)

should get code stats for a short code listed the org's short codes and called getCodeStats on a random one. Three things make that race likely, not rare:

  • The list endpoint returns newest-first.
  • The shared test organization has ~1,400 accumulated sdk-test codes (leaked by past failed runs), and the newest entries are the temporary codes of whatever test runs are in flight right now.
  • Every PR push runs 4 suites in parallel against the same organization (tests.yaml Node 22/24/26 matrix + code-coverage.yaml), each deleting its temp codes within seconds.

So the random pick frequently landed on another run's code that was deleted before getCodeStats resolved → 404. Verified live: GET .../codes/{id}/stats on a deleted code returns 404.

Fix: the test now creates its own short code, gets stats for it, and cleans it up. (Verified live that a fresh, zero-click code returns 200 with all stats fields populated.)

2. Eventual consistency after writes

Reads, QR operations, and deletes that run immediately after createShortCode can transiently 404 before the write is visible — the suite already worked around this in one test with a 500 ms sleep and a comment. All live calls now go through a small retry helper with backoff, and list assertions poll until the created items are visible instead of asserting on the first response.

3. Cleanup failures failed unrelated tests

Most tests asserted their cleanup deleteShortCode(...) returned true, so a transient cleanup 404 failed a test whose subject behavior had already passed. Cleanup is now best-effort (retry + warn — the pattern the suite already used in one place), and explicit delete coverage lives in dedicated tests (should create and delete a short code, should delete a QR code by ID).

4. qrCodeBytes had the wrong typed array (QR bug)

createQrCode, getQrCode, and getQrCodes decoded the base64 PNG with new Uint16Array(buffer), which widens every byte to a 16-bit element — writing those bytes out produces a corrupted, double-size image. Changed to Uint8Array (type + 3 call sites). ⚠️ This changes the public CreateQrCodeResponse.qrCodeBytes type; it's a bugfix, but worth flagging in release notes.

Should we run the tests in parallel?

No — parallelism is the cause here, not the cure. Within one run, vitest already runs test files in parallel and tests within a file sequentially, which is fine because only link.test.ts touches the link API. The conflicts come from multiple suites sharing one organization (CI matrix + coverage workflow + local runs). With every test now self-contained (operating only on codes it created) plus retries for eventual consistency, the suite is safe under that existing parallelism.

Verification

  • pnpm test: 235 tests pass, lint clean, statements and branches back at 100%.
  • 3 full link.test.ts suites run concurrently (simulating the CI matrix): all pass.
  • pnpm build and tsc --noEmit pass.
  • Live probes confirmed: fresh-code stats → 200; deleted-code stats → 404; QR create immediately after code create → 201.

Follow-up recommendations (not in this PR)

  • The shared org has ~1,400 leaked sdk-test codes from past failed runs. Consider a scheduled sweeper for sdk-test-tagged codes older than an hour, or a dedicated org for CI.
  • Non-2xx responses surface as @cacheable/net's generic Fetch failed with status N, so the SDK's contextual Failed to create QR code: ... branches are unreachable (they're v8 ignored). Wrapping network calls to add operation context would make future failures much easier to attribute.

https://claude.ai/code/session_01Jw4fk5in8dEyFKCs3pLY9g


Generated by Claude Code

The link integration tests intermittently failed with 'Fetch failed with
status 404'. Audit findings and fixes:

- The stats test listed the organization's short codes (newest first) and
  picked a random one. Concurrent CI runs (Node version matrix plus the
  coverage workflow) share the organization and delete their temporary
  codes within seconds, so the picked code often vanished before
  getCodeStats resolved. The test now creates and uses its own code.
- Operations that run immediately after a create can transiently 404
  while the API is eventually consistent. All live calls now go through a
  retry helper with backoff (replacing the one-off 500ms sleep), and list
  assertions poll until the created items are visible.
- Cleanup deletes were asserted, so a transient cleanup failure failed
  tests whose subject had already passed. Cleanup is now best-effort with
  a warning; dedicated delete tests assert the delete paths explicitly.

Also fixes qrCodeBytes to be Uint8Array: new Uint16Array(buffer) widened
each byte to 16 bits, which corrupts the QR image when the bytes are
written out.

https://claude.ai/code/session_01Jw4fk5in8dEyFKCs3pLY9g
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (a37c236) to head (f6c2eee).

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #143   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            8         8           
  Lines          492       492           
  Branches       105       101    -4     
=========================================
  Hits           492       492           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jaredwray jaredwray merged commit 0c26338 into main Jun 12, 2026
9 checks passed
@jaredwray jaredwray deleted the claude/vigilant-cori-y87lpe branch June 12, 2026 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants