Skip to content

Check for 404 redirect URLs #2190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 11, 2025
Merged

Check for 404 redirect URLs #2190

merged 7 commits into from
Feb 11, 2025

Conversation

nprt
Copy link
Contributor

@nprt nprt commented Feb 4, 2025

Create a GitHub Actions Workflow that will check for 404 response codes for the list of redirect URLs.

This will help in tracking the missing URLs, primarily to prevent losing SEO ranking for the corresponding pages.

The script will run as a cron job on Mondays at 2pm UTC, on push, and as a manual action.

Copy link

netlify bot commented Feb 4, 2025

Deploy Preview for substrate-docs ready!

Name Link
🔨 Latest commit fd65411
🔍 Latest deploy log https://app.netlify.com/sites/substrate-docs/deploys/67ab18175c0b8f0008c64dee
😎 Deploy Preview https://deploy-preview-2190--substrate-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@ArshamTeymouri
Copy link

Do I understand this correctly that we would only need to run this once? I was wondering why would it need to be run as a github action if it can be run a few times locally to do the final cross checks.

@mordamax
Copy link
Collaborator

mordamax commented Feb 5, 2025

As I mentioned in matrix - IMO would be more correct to use .netlify/_redirects as source of URLs and docs.substrate.com as a domain, if the goal is to check the redirects, as it states in the description.

Currently it checks the list of hardcoded destinations, which are not redirects in fact.

Later if anyone is going to fix any of 404 - there's a high risk of unsync / mistake due to a need of supporting 2 sources

if that's something you're aware of and ok, we can for sure merge as is

cc @RemyLeBerre

@nprt
Copy link
Contributor Author

nprt commented Feb 5, 2025

Do I understand this correctly that we would only need to run this once? I was wondering why would it need to be run as a github action if it can be run a few times locally to do the final cross checks.

We'd have to run it periodically. The goal is to be able to track dead redirects easily. I'd also like to avoid reliance on me, if possible. We still have to decide until when to keep track of it ourselves.

As I mentioned in matrix - IMO would be more correct to use .netlify/_redirects as source of URLs and docs.substrate.com as a domain, if the goal is to check the redirects, as it states in the description.

Currently it checks the list of hardcoded destinations, which are not redirects in fact.

Later if anyone is going to fix any of 404 - there's a high risk of unsync / mistake due to a need of supporting 2 sources

if that's something you're aware of and ok, we can for sure merge as is

Afaik, there'll be no more changes to the redirects, or this repo overall. Regardless, it's a good call to ask. @kianenigma @RemyLeBerre do you think it's necessary to rely on the _redirects file itself, rather than on a duplicate? If so, I don't really mind changing the logic to parse the file for URLs. The idea was to make a simple solution that Remy (and I) can use.

@mordamax
Copy link
Collaborator

mordamax commented Feb 5, 2025

the main concern, is that you say redirects, but you're checking against destination, so you're not actually checking redirects :)

the url you're checking is https://docs.polkadot.com/develop/parachains/customize-parachain
the url you should check instead is http://docs.substrate.io/build/troubleshoot-your-code/ which ends up with non 404

end for example very first redirect in the _redirects file - http://docs.substrate.io/build/application-development/
it should fail this test, and destination it ended up with - is 404

@nprt
Copy link
Contributor Author

nprt commented Feb 11, 2025

the main concern, is that you say redirects, but you're checking against destination, so you're not actually checking redirects :)

the url you're checking is https://docs.polkadot.com/develop/parachains/customize-parachain the url you should check instead is http://docs.substrate.io/build/troubleshoot-your-code/ which ends up with non 404

end for example very first redirect in the _redirects file - http://docs.substrate.io/build/application-development/ it should fail this test, and destination it ended up with - is 404

After a discussion with @mordamax , we decided to use the pipeline to check against the docs.substrate.io links for redirects. This commit reflects that. The output will still reflect the missing polkadot.com URLs.

response=$(curl -L -o /dev/null -s -w "%{http_code}" -m 10 "$destination")

if [ "$response" = "404" ]; then
echo "::warning::$destination"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this ain't going fail/report anywhere?
may be collect 404s into array and if it's empty exit 0, if there are elements 1 ?
alternatively send an email/matrix message to room?
otherwise im curious how will we not forget to check and analyze it periodically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this ain't going fail/report anywhere?
otherwise im curious how will we not forget to check and analyze it periodically?

@RemyLeBerre and I have a recurring weekly call to go over this together (it's one of the call topics). It's a temporary check, and I expect we'll archive the repository/project before we stop having the call.

may be collect 404s into array and if it's empty exit 0, if there are elements 1 ?
alternatively send an email/matrix message to room?

I'd rather we try to avoid adding any integrations, since it's going to be decommissioned pretty soon. I think the same goes for the error code/status. I'd prefer classifying it as a warning, since it's not really a redirect error, and will be fixed on the other side (polkadot.com). This is more a source of information, rather than a trigger to do something in this repo.

Copy link
Collaborator

@mordamax mordamax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT 🚀

@mordamax mordamax merged commit 4c33bc8 into polkadot-developers:main Feb 11, 2025
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants