Skip to content

fix: retry csv parity check once to tolerate cloudflare cache propagation#299

Merged
rudrakshbhandari merged 3 commits into
mainfrom
upbeat-yalow-d2375f
Jun 3, 2026
Merged

fix: retry csv parity check once to tolerate cloudflare cache propagation#299
rudrakshbhandari merged 3 commits into
mainfrom
upbeat-yalow-d2375f

Conversation

@rudrakshbhandari

Copy link
Copy Markdown
Owner

Summary

  • After a snapshot publish, the stats API returns new metadata immediately but Cloudflare may still serve the previous districts.csv for a short propagation window (~10–30s)
  • assertCsvMetadataParity was detecting this as a hard failure, firing the nyaaywatch-production-public-alpha-ops CloudWatch alarm intermittently (3 times on 2026-05-29 at 11:27, 14:01, 22:57 UTC)
  • On first mismatch, the check now waits 15s and re-fetches districts.csv before failing; a genuine long-lived drift still fails on the second attempt

Changes

  • src/dev/release-verification.ts: wrap assertCsvMetadataParity in a try/catch; on failure, sleep(retryDelayMs) then re-fetch and re-check
  • csvParityRetryDelayMs option added to verifyPublicRelease so tests can pass 0 without any actual delay
  • tests/release-verification.test.ts: two new tests — retry succeeds when the second fetch is consistent; retry propagates the error when both fetches are stale

Test plan

  • All 8 release-verification unit tests pass (vitest run tests/release-verification.test.ts)
  • Monitor the nyaaywatch-production-public-alpha-ops alarm after deploy — expect no more OK→ALARM transitions caused by transient parity windows

…tion

After a snapshot publish, the stats API immediately returns the new snapshot
metadata while Cloudflare may still serve the previous districts.csv for a
short window. This caused transient parity mismatches that fired the production
public-alpha-ops alarm (nyaaywatch-production-public-alpha-ops-alerts).

On the first mismatch, wait 15 s and re-fetch districts.csv before failing.
A genuine drift (not a propagation race) will still fail on the second attempt.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19170547c9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/dev/release-verification.ts Outdated
@github-actions

github-actions Bot commented May 30, 2026

Copy link
Copy Markdown

Preview closed.

Service: nyaaywatch-pr-299

@rudrakshbhandari rudrakshbhandari enabled auto-merge (squash) May 30, 2026 03:58
…e dup

Codex P2: the CSV parity retry replaced the response body but only re-checked
metadata parity, leaving publicDataCacheProtected based on the first (stale)
response. A post-propagation CSV missing no-store could pass. Now re-run
assertCacheProtection on the retried response before accepting it, since that is
the CSV the release ultimately validates against.

Also drop a duplicate sleep() introduced by merging main (main added its own
sleep for the transient-retry helper); keep the canonical one.

Regression test: first districts.csv is cache-protected but parity-stale, the
retried response is parity-consistent but cacheable -> release still fails.
@rudrakshbhandari rudrakshbhandari merged commit 7cfb99c into main Jun 3, 2026
4 checks passed
@rudrakshbhandari rudrakshbhandari deleted the upbeat-yalow-d2375f branch June 3, 2026 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant