Skip to content

fix: make MSLearn URL validation repo-scoped and rate-limit aware #6

@yeongseon

Description

@yeongseon

Finding

The main docs/content gates pass, but the MSLearn URL validator is not reliable as a quality gate.

  • python3 scripts/validate_mslearn_urls.py --project azure-storage-practical-guide reports failures dominated by HTTP 429 throttling.
  • scripts/validate_mslearn_urls.py:122-129 treats every non-200/non-404 response, including 429, as a broken URL.
  • scripts/validate_mslearn_urls.py:257-263 discovers repos by walking to the parent directory and globbing azure-*-practical-guide, so local validation from a sibling checkout can scan unrelated repos.
  • scripts/validate_mslearn_urls.py:57-63 assumes content_sources is always a mapping, which is fragile if legacy list-form frontmatter is introduced.

Why this matters

Storage currently has a clean validate_content_sources.py result, so URL validation should be a useful independent gate. Today it can fail because Microsoft Learn throttles requests, not because links are broken, and the default repo discovery depends on checkout layout.

Suggested fix

  • Default to the current repository unless an explicit multi-repo sweep is requested.
  • Retry/back off on HTTP 429 and classify throttled URLs separately.
  • Handle both mapping-form and legacy list-form content_sources defensively.

Verification

  • mkdocs build --strict passes.
  • python3 scripts/validate_content_sources.py passes.
  • python3 scripts/validate_mslearn_urls.py --project azure-storage-practical-guide currently fails due to 429-classified errors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions