Finding
The main docs/content gates pass, but the MSLearn URL validator is not reliable as a quality gate.
python3 scripts/validate_mslearn_urls.py --project azure-storage-practical-guide reports failures dominated by HTTP 429 throttling.
scripts/validate_mslearn_urls.py:122-129 treats every non-200/non-404 response, including 429, as a broken URL.
scripts/validate_mslearn_urls.py:257-263 discovers repos by walking to the parent directory and globbing azure-*-practical-guide, so local validation from a sibling checkout can scan unrelated repos.
scripts/validate_mslearn_urls.py:57-63 assumes content_sources is always a mapping, which is fragile if legacy list-form frontmatter is introduced.
Why this matters
Storage currently has a clean validate_content_sources.py result, so URL validation should be a useful independent gate. Today it can fail because Microsoft Learn throttles requests, not because links are broken, and the default repo discovery depends on checkout layout.
Suggested fix
- Default to the current repository unless an explicit multi-repo sweep is requested.
- Retry/back off on HTTP 429 and classify throttled URLs separately.
- Handle both mapping-form and legacy list-form
content_sources defensively.
Verification
mkdocs build --strict passes.
python3 scripts/validate_content_sources.py passes.
python3 scripts/validate_mslearn_urls.py --project azure-storage-practical-guide currently fails due to 429-classified errors.
Finding
The main docs/content gates pass, but the MSLearn URL validator is not reliable as a quality gate.
python3 scripts/validate_mslearn_urls.py --project azure-storage-practical-guidereports failures dominated by HTTP 429 throttling.scripts/validate_mslearn_urls.py:122-129treats every non-200/non-404 response, including 429, as a broken URL.scripts/validate_mslearn_urls.py:257-263discovers repos by walking to the parent directory and globbingazure-*-practical-guide, so local validation from a sibling checkout can scan unrelated repos.scripts/validate_mslearn_urls.py:57-63assumescontent_sourcesis always a mapping, which is fragile if legacy list-form frontmatter is introduced.Why this matters
Storage currently has a clean
validate_content_sources.pyresult, so URL validation should be a useful independent gate. Today it can fail because Microsoft Learn throttles requests, not because links are broken, and the default repo discovery depends on checkout layout.Suggested fix
content_sourcesdefensively.Verification
mkdocs build --strictpasses.python3 scripts/validate_content_sources.pypasses.python3 scripts/validate_mslearn_urls.py --project azure-storage-practical-guidecurrently fails due to 429-classified errors.