You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Feed Validation workflow (.github/workflows/feed-validation.yml) has
been failing 100% of runs since it landed in PR #3872 — every push to
main + every scheduled 6h run. Five root causes, all addressed here:
1. fast-xml-parser default entity-expansion limit was tripping on
legitimate large feeds (Guardian, Fox, Axios, CISA, WHO, MIT,
Defense One, Folha, El País, iefimerida, GitHub Trending,
Dev.to, Oryx OSINT, …). We only read date strings from the
parsed doc, so processEntities:false is safe and recovers all
17 false-positive DEAD rows.
2. 10 hosts referenced from src/config/feeds.ts were absent from
the 5-file allowlist mirror set (shared/rss-allowed-domains.{json,cjs},
scripts/shared/rss-allowed-domains.json, api/_rss-allowed-domains.js,
vite.config.ts:RSS_PROXY_ALLOWED_DOMAINS). Added: abcnews.go.com +
abcnews.com (feeds.abcnews.com → abcnews.go.com → abcnews.com
two-hop chain), www.corriere.it, www.rt.com, www.alarabiya.net,
tuoitrenews.vn, www.yonhapnewstv.co.kr, www.chosun.com,
rss.libsyn.com, feeds.megaphone.fm, rss.art19.com. The same
allowlist gates the prod Edge rss-proxy, so this also silently
restores access to these feeds for live users.
3. BBC Persian was declared as plaintext http://, rejected by the
--ci https-only guard. Updated to the canonical
https://feeds.bbci.co.uk/persian/rss.xml (server-side mirror
already had this).
4. Tom's Hardware /feeds/all redirects to http://… on the same
host, tripping the per-hop https guard. The canonical https
path is /feeds.xml — switched both client and server mirrors.
5. Validator was hard-failing on any STALE-or-DEAD row, which made
the workflow noise floor unbearable: 8 stale + 32 dead = 40
reasons to be red, of which only 10 were actionable. Split the
exit policy: HARD-FAIL on config/SSRF-guard drift (allowlist
miss, plaintext URL, redirect loop) so future drift is loud,
SOFT-FAIL (exit 0 with warn) on third-party 4xx/timeouts/STALE
so a feed disappearing upstream doesn't page anyone. Promoting
third-party failures to hard-fail can wait for a registry
grooming PR.
Also bumps the scheduled cadence from every-6h to daily-00:00-UTC.
4× the discovery rate added zero value — feed outages don't change
faster than once-a-day, and 542 feeds × 4 runs/day was wasted
runner-minutes and third-party fetch volume.
Local validator result (after the fix):
Summary: 512 OK, 10 stale, 6 dead, 13 empty, 1 skipped
Exit: 0 (no config drift). 6 remaining DEAD are all genuine
third-party state (Brasil Paralelo 404, EIA Reports 404 [duplicate
entry], News24 403, Tuoi Tre + Al Arabiya unreachable from this
network) — candidates for a future registry-cleanup PR.
Test coverage: tests/feeds-client-server-parity.test.mjs,
tests/feed-resolution.test.mts, tests/feeds-time-gate.test.mts —
all green. Full test:data suite — green.
0 commit comments