Problem
Our broken-link check (Website URLs workflow + docs/.vitepress/test-links.sh) runs HTML-Proofer with a blanket --ignore-status-codes list that includes 0. Status 0 is HTML-Proofer's catch-all for "couldn't get a real HTTP response" — DNS resolution failure, TLS handshake failure, connection refused, and timeouts. Suppressing it means we silently pass genuinely dead links, which defeats the entire purpose of the check.
Concrete examples found while triaging PR #2106:
https://v2.dotnet.stockindicators.dev — no DNS record (returns status 0). Currently on every page via the "Legacy docs (v2)" nav item and silently passing.
https://school.stockcharts.com/doku.php?id=technical_indicators:kaufman_s_adaptive_moving_average — dead (status 0). Was passing only because 0 is exempted.
Current config (restored in PR #2106 to get CI green short-term):
--ignore-status-codes "0,302,402,403,406,408,415,429,502,503,999"
--ignore-urls "/fonts.gstatic.com/,/github\.com\/DaveSkender\/Stock\.Indicators\/(blob|tree)\//"
Goal
Catch legitimately dead links again while not flapping on the noise that forced the blanket exemptions in the first place:
- Keep suppressing genuine bot-blocking / rate-limiting that we can't fix:
401/402/403/406/429/999, and the GitHub 429s on auto-generated edit/blob/tree/discussions links. (We originally exempted the GitHub Discussion links specifically due to 429 rate-limiting.)
- Stop blanket-suppressing
0 so DNS/dead-host failures surface again.
Tension to solve
0 is overloaded: it covers both "this host is dead" (want to fail) and "the request timed out / TLS hiccup / transient network" (don't want to flake). Any solution needs to separate these.
Candidate approaches
- Per-host/per-URL allowlist instead of blanket status codes. Drop
0 from --ignore-status-codes; explicitly --ignore-urls the known bot-blocking domains (investopedia, medium, coinbase, codeburst, jstor, forex-station, etc.) and the GitHub auto-generated paths. Pros: dead hosts surface. Cons: allowlist needs occasional maintenance; timeouts on otherwise-good hosts can still flake.
- Add retry/backoff + timeout tuning (HTML-Proofer
--hydra/typhoeus options, or wrap with retries) so transient 0s self-heal, then fail on persistent 0. Reduces flake from genuine-but-slow hosts.
- Switch tools to lychee. Native retry, response caching, per-status
--accept, regex excludes, and a maintained excludes set for bot-blockers. Generally better signal-to-noise than HTML-Proofer for external links.
- Split internal vs external checks. Keep internal-link + curated-external checking blocking on PRs (cheap, deterministic); move full external sweep to a scheduled (cron) job that opens/updates an issue on dead links. PRs stop flaking on third-party outages while dead links still get reported.
- Suppress
search-path referral URLs by pattern (e.g. *google.com/search*). We intentionally use Google-search referrals where a stable canonical source doesn't exist (kama.md, getting-started.md); a small regex keeps those out of scope rather than relying on status codes.
Suggested direction
Likely a combination: #1 + #3 (or #2) for signal, plus #4 to keep PRs non-flaky, plus #5 for the deliberate search-referral links. Whatever we pick, the acceptance test is: re-introducing a known-dead URL must fail the check.
Acceptance criteria
Follow-up from PR #2106, which restored the blanket exemptions as a stopgap to unblock CI.
Problem
Our broken-link check (
Website URLsworkflow +docs/.vitepress/test-links.sh) runs HTML-Proofer with a blanket--ignore-status-codeslist that includes0. Status0is HTML-Proofer's catch-all for "couldn't get a real HTTP response" — DNS resolution failure, TLS handshake failure, connection refused, and timeouts. Suppressing it means we silently pass genuinely dead links, which defeats the entire purpose of the check.Concrete examples found while triaging PR #2106:
https://v2.dotnet.stockindicators.dev— no DNS record (returns status 0). Currently on every page via the "Legacy docs (v2)" nav item and silently passing.https://school.stockcharts.com/doku.php?id=technical_indicators:kaufman_s_adaptive_moving_average— dead (status 0). Was passing only because0is exempted.Current config (restored in PR #2106 to get CI green short-term):
Goal
Catch legitimately dead links again while not flapping on the noise that forced the blanket exemptions in the first place:
401/402/403/406/429/999, and the GitHub429s on auto-generatededit/blob/tree/discussionslinks. (We originally exempted the GitHub Discussion links specifically due to429rate-limiting.)0so DNS/dead-host failures surface again.Tension to solve
0is overloaded: it covers both "this host is dead" (want to fail) and "the request timed out / TLS hiccup / transient network" (don't want to flake). Any solution needs to separate these.Candidate approaches
0from--ignore-status-codes; explicitly--ignore-urlsthe known bot-blocking domains (investopedia, medium, coinbase, codeburst, jstor, forex-station, etc.) and the GitHub auto-generated paths. Pros: dead hosts surface. Cons: allowlist needs occasional maintenance; timeouts on otherwise-good hosts can still flake.--hydra/typhoeus options, or wrap with retries) so transient0s self-heal, then fail on persistent0. Reduces flake from genuine-but-slow hosts.--accept, regex excludes, and a maintained excludes set for bot-blockers. Generally better signal-to-noise than HTML-Proofer for external links.search-path referral URLs by pattern (e.g.*google.com/search*). We intentionally use Google-search referrals where a stable canonical source doesn't exist (kama.md,getting-started.md); a small regex keeps those out of scope rather than relying on status codes.Suggested direction
Likely a combination: #1 + #3 (or #2) for signal, plus #4 to keep PRs non-flaky, plus #5 for the deliberate search-referral links. Whatever we pick, the acceptance test is: re-introducing a known-dead URL must fail the check.
Acceptance criteria
429rate-limiting do not fail the check.search-path referral links are handled by pattern, not by status-code suppression..github/workflows/test-website-links.ymlanddocs/.vitepress/test-links.sh.Follow-up from PR #2106, which restored the blanket exemptions as a stopgap to unblock CI.