fix: retry csv parity check once to tolerate cloudflare cache propagation#299
Merged
Conversation
…tion After a snapshot publish, the stats API immediately returns the new snapshot metadata while Cloudflare may still serve the previous districts.csv for a short window. This caused transient parity mismatches that fired the production public-alpha-ops alarm (nyaaywatch-production-public-alpha-ops-alerts). On the first mismatch, wait 15 s and re-fetch districts.csv before failing. A genuine drift (not a propagation race) will still fail on the second attempt.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 19170547c9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Preview closed. Service: |
…e dup Codex P2: the CSV parity retry replaced the response body but only re-checked metadata parity, leaving publicDataCacheProtected based on the first (stale) response. A post-propagation CSV missing no-store could pass. Now re-run assertCacheProtection on the retried response before accepting it, since that is the CSV the release ultimately validates against. Also drop a duplicate sleep() introduced by merging main (main added its own sleep for the transient-retry helper); keep the canonical one. Regression test: first districts.csv is cache-protected but parity-stale, the retried response is parity-consistent but cacheable -> release still fails.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
districts.csvfor a short propagation window (~10–30s)assertCsvMetadataParitywas detecting this as a hard failure, firing thenyaaywatch-production-public-alpha-opsCloudWatch alarm intermittently (3 times on 2026-05-29 at 11:27, 14:01, 22:57 UTC)districts.csvbefore failing; a genuine long-lived drift still fails on the second attemptChanges
src/dev/release-verification.ts: wrapassertCsvMetadataParityin a try/catch; on failure,sleep(retryDelayMs)then re-fetch and re-checkcsvParityRetryDelayMsoption added toverifyPublicReleaseso tests can pass0without any actual delaytests/release-verification.test.ts: two new tests — retry succeeds when the second fetch is consistent; retry propagates the error when both fetches are staleTest plan
release-verificationunit tests pass (vitest run tests/release-verification.test.ts)nyaaywatch-production-public-alpha-opsalarm after deploy — expect no more OK→ALARM transitions caused by transient parity windows