Skip to content

github_actions: monthly check for broken hyperlinks #7537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 7, 2025

Conversation

sspaink
Copy link
Contributor

@sspaink sspaink commented Apr 29, 2025

resolve: #3249

Check for broken hyperlinks in markdown files using: https://github.com/lycheeverse/lychee
How could I not pick the project with an adorable/creepy smiling lychee as the mascot.

Using this github action to run lychee:
https://github.com/lycheeverse/lychee-action

I tested this workflow out on my fork: https://github.com/sspaink/opa/blob/main/.github/workflows/link_checker.yaml
Manually triggered it: https://github.com/sspaink/opa/actions/runs/14742343402
Then it created this issue: sspaink#5

This workflow will run on Monday, if any errors are found it will open a single issue like the one above with a summary. The ADOPTERS.md has some links that return 403 but are valid, so added 403 as an accepted status. Hopefully that won't cause any problems, could also add exceptions for the URLs themselves.

There are also lots of issues in the docs folder that seem valid, but maybe I am missing something about how the docs work. With the upcoming docs revamp not sure if these errors should be ignored. @charlieegan3 advice on this would be appreciate 😄

You can also run the tool locally:

$ brew install lychee
$ lychee --no-progress --exclude-path vendor --exclude-path CHANGELOG.md --accept 200..=206,403  .

Originally experimented with https://github.com/UmbrellaDocs/linkspector but that tool is designed to review incoming PRs. Which could also be useful to prevent bad links from being submitted but lychee seems faster and is already setup to be used as the issue describes to run weekly.

@sspaink sspaink changed the title chore: check for dead links with linkspector chore: weekly check for broken hyperlinks Apr 29, 2025
@sspaink sspaink marked this pull request as ready for review April 29, 2025 22:25
@sspaink sspaink added the github_actions Pull requests that update GitHub Actions code label Apr 30, 2025
@sspaink sspaink changed the title chore: weekly check for broken hyperlinks github_actions: weekly check for broken hyperlinks Apr 30, 2025
Copy link
Contributor

@johanfylling johanfylling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picking a random ./docs error in the issue generated for your fork:

[ERROR] file:///home/runner/work/opa/opa/docs/contrib-development#fork-clone-create-a-branch | Cannot find file

This is a a valid link once the page has been generated. Maybe lychee (delicious by the way) isn't fit for checking these kinds of links, and they should be excluded. Alternatively, we could check the generated page, but that would require us to build and run it first (maybe through netlify)..

Another random external link error:

[500] https://ceph.io/ | Network error: Internal Server Error

This site seems to be alive an well now. I don't see this kind of flakiness as a problem as long as we don't run this task often enough for it to become annoying; and if we don't need to sift through hundreds of false positives for local links.

repository_dispatch:
workflow_dispatch:
schedule:
- cron: "0 13 * * 1" # Every Monday at 1PM UTC (9AM EST)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe once a month is enough?

@charlieegan3
Copy link
Contributor

This is going to be a build time check when the new site drops btw. we also have broken anchor checking as non blocking which I plan to address when that's merged. I am unsure we need to async check but no harm I guess.

@johanfylling
Copy link
Contributor

@charlieegan3, do you mean that all the checks done here will be made on a per-PR basis with the new site, or only page-local linking?
Since some of the external links could be flaky, maybe we don't want that to affect all PRs.

@sspaink sspaink changed the title github_actions: weekly check for broken hyperlinks github_actions: monthly check for broken hyperlinks May 6, 2025
Copy link

netlify bot commented May 6, 2025

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit 8c800c4
🔍 Latest deploy log https://app.netlify.com/sites/openpolicyagent/deploys/681b88acc0d25c0008c069ec
😎 Deploy Preview https://deploy-preview-7537--openpolicyagent.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@sspaink
Copy link
Contributor Author

sspaink commented May 6, 2025

some changes in the latest commit, tried it out in my fork so you can see the example issue it will create: sspaink#12

  • checks monthly
  • only check schemas https and http so it ignores the anchor links, avoids having to build
  • set max concurrency to 1, means the job took longer (17 minutes vs 1 minute) but there were a lot less too many request issues

@charlieegan3
Copy link
Contributor

@charlieegan3, do you mean that all the checks done here will be made on a per-PR basis with the new site, or only page-local linking? Since some of the external links could be flaky, maybe we don't want that to affect all PRs.

Ahh, actually Docusaurus doesn't do this (I thought it did). And yeah, makes sense not to block PRs.

@johanfylling
Copy link
Contributor

some changes in the latest commit, tried it out in my fork so you can see the example issue it will create: sspaink#12

Wow, that's a lot of broken links. Did some unscientific investigation through random clicking; and indeed, all the ones I tried had the reported issue.

Copy link
Contributor

@johanfylling johanfylling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@charlieegan3
Copy link
Contributor

Small request that we hold off fixing issues in the site's broken links until we have merged #7534. Then I will not need to manually port over the updates in content.

I hope we can get #7534 in next week 🙏

@sspaink sspaink merged commit 4a9cdf4 into open-policy-agent:main May 7, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
github_actions Pull requests that update GitHub Actions code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatic scanning of docs for broken links
3 participants