Skip to content

Programmatically found broken internal links #527

Open
@ff6347

Description

@ff6347

Most appropriate sections of the p5.js website?

Other (specify if possible)

What is your operating system?

Mac OS

Web browser and version

not related to browsers

Actual Behavior

Broken links generate a 404

Expected Behavior

Links should serve the linked content

Steps to reproduce

Since I saw some reports of broken links in the website I tried a programmatic approach using broken-link-checker.

How to:

Edit: To make it easier to generate an overview I created a script that outputs GitHub flavored markdown. https://github.com/ff6347/find-broken-links

npm install broken-link-checker
npx blc -roe https://p5js.org > report-blc-prod-live.txt

# The flags do the following
# --exclude-external, -e  Will not check external links.
# --ordered, -o           Maintain the order of links as they appear in their HTML document.
# --recursive, -r         Recursively scan ("crawl") the HTML document(s).

The output needs some cleaning using regular expressions afterwards.

Remove all reports that are okay:

^├───OK───.*?$\n

Then remove all reports of urls that have no broken links:

^Getting links from:.*?$\nFinished! \d{1,1000} links found. \d{1,1000} excluded. 0 broken.

Fix some reports where there is no new line between the report and the next.

look for:

broken\.\n(\w)

Replace with:

broken.\n\n$1

These three should do most of the grunt work. There are some false positive reports about images that I removed manually at the end.


Here is what I've found:

Low Hanging Fruits

There are some links in footer and reference that create 301 redirects which can be avoided through small tweaks.

These are the lines that need a trailing slash:

Generally these are neglect-able, but a HEAD request against them returns a 301 HTTP response. This could be fixed since it is a low hanging fruit.

301 redirects

There are many links in the reference generated form the JSDoc comments in the source of p5.js that are lacking the trailing slash and generate a 301 HTTP redirect. These need some manual labor in the source code. This is a bit tedious work. It should not be a problem keeping this as is since the site-host + Astro take care of the redirects.

Broken Links

There are some broken links also generated from the JSDoc that need fixing and I found broken links in the tutorials. They are hard-coded into the .mdx files and need to be fixed one by one.

Next Steps

IMO the next steps should be

  • creating separate issues for these since fixing it just based on this list is error prone.
  • check external links as well when the internal ones are fixed.

Below you can find the report

report created: Fri, 13 Sep 2024 11:06:07 GMT

Would you like to work on the issue?

yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions