Skip to content

Conversation

@StrikerRUS
Copy link
Collaborator

@StrikerRUS StrikerRUS commented Sep 11, 2025

All our recent link check jobs are failing: https://github.com/microsoft/LightGBM/actions/workflows/linkchecker.yml. Last successful job was 5 months ago.

The most errors are about 429 Rate Limit GitHub urls, but others are quite random ones. It looks like our current linkchecker isn't reliable enough.

Initially I thought to use linkinator but then found lychee. This tool looks quite simple to use but powerful.

Latest linkchecker results:

That's it. 1058 links in 1076 URLs checked. 3 warnings found. 4 errors found.
Stopped checking at 2025-09-11 08:54:22+000 (48 minutes, 31 seconds)

New lychee results:

🔍 2188 Total (in 27s) ✅ 2167 OK 🚫 0 Errors 👻 21 Excluded

I left some inline comments below to help understand some non obvious moments.

@StrikerRUS StrikerRUS changed the title [WIP][ci][docs] fix link checking action by switching from linkchecker to lychee and update some links [ci][docs] fix link checking action by switching from linkchecker to lychee and update some links Sep 11, 2025
@StrikerRUS StrikerRUS marked this pull request as ready for review September 11, 2025 16:08
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the hard work! This is awesome, I'm glad to see so many broken links fixed and outdated documentation removed.

I totally support switching to lychee. I've been using it in pydistcheck for a little while and really liking it.

Would you consider (can be a follow-up) using https://github.com/lycheeverse/lychee-action instead?

Benefits of this:

  • could remove more code from test-docs.sh
  • it's very fast (no need for conda, for example)

Here's an example of a setup like that: jameslamb/pydistcheck#312

I'm approving this PR because I'd be happy to see it merged as-is. Everything I've suggested could be follow-up PRs, if you agree with the suggestinos.

.ci/test-docs.sh Outdated
# to see all gained links add "--dump" flag
lychee \
"--config=./docs/.lychee.toml" \
"--exclude-path=(^|/)docs/.*\.rst" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also add this to lychee.toml: https://lychee.cli.rs/recipes/excluding-paths/

The more that gets put into that file, the easier it'd be to run this interactively during development.

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Sep 12, 2025

@jameslamb Thanks a lot for reviewing this! Nice to hear that you are already familiar with lychee 😃.

Would it be OK for you if I push changes for GitHub token and paths exclusion in this PR, and move this to GitHub Action in a follow-up PR?

@jameslamb
Copy link
Collaborator

Would it be OK for you if I push changes for GitHub token and paths exclusion in this PR, and move this to GitHub Action in a follow-up PR?

Yes absolutely!

@StrikerRUS
Copy link
Collaborator Author

@jameslamb I've pushed changes I promised you in #7027 (comment). Could you please check them when have time? Please notice I have to add stackoverflow site to ignore list due to the following errors:

Issues found in 3 inputs. Find details below.

[docs/_build/html/FAQ.html]:
[403] https://stackoverflow.com/questions/18085571/pip-install-error-setup-script-specifies-an-absolute-path | Rejected status code (this depends on your "accept" configuration): Forbidden

[docs/_build/html/Parameters-Tuning.html]:
[403] https://stats.stackexchange.com/questions/317073/explanation-of-min-child-weight-in-xgboost-algorithm | Rejected status code (this depends on your "accept" configuration): Forbidden

[README.md]:
[403] https://stackoverflow.com/questions/ask?tags=lightgbm | Rejected status code (this depends on your "accept" configuration): Forbidden
[403] https://stackoverflow.com/questions/tagged/lightgbm?sort=votes | Rejected status code (this depends on your "accept" configuration): Forbidden

🔍 2208 Total (in 26s) ✅ 2183 OK 🚫 4 Errors 👻 21 Excluded

Now results look like the following:

🔍 2208 Total (in 26s) ✅ 2183 OK 🚫 0 Errors 👻 25 Excluded

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great... all of my suggestions have been addressed and I'm happy to see this only takes around 2 minutes to run (build link) compared to nearly an hour for the linkcheck-based job (build link)!!!

@jameslamb jameslamb merged commit 6368375 into master Oct 11, 2025
56 checks passed
@jameslamb jameslamb deleted the ci/lychee branch October 11, 2025 05:21
@StrikerRUS
Copy link
Collaborator Author

... I'm happy to see this only takes around 2 minutes to run (build link) compared to nearly an hour for the linkcheck-based job (build link)!!!

Yeah, thanks to GitHub token lychee mechanism we don't need to manually slow down link checking to avoid 429 Rate limit errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants