Skip to content

Conversation

@kaby76
Copy link
Contributor

@kaby76 kaby76 commented Feb 1, 2026

This is a fix for #4734. I had Claude write a Bash script that detects stale links. The check is done per PR, per grammar, and is also done weekly. The script is _scripts/find-stale-links.sh and includes the --replace option, which modifies stale links and replaces them with web.archive.org links.

kaby76 and others added 8 commits February 1, 2026 17:18
Adds a script to detect and optionally replace stale HTTP links with
web.archive.org archived versions. The script uses git blame to find
when each link was added, then queries the Wayback Machine API for an
archive closest to that date.

Also adds a GitHub Actions workflow that runs weekly to check for stale
links, creates issues when found, and can create PRs with fixes when
triggered manually.

Fixes antlr#4734

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Handle URLs containing parentheses (e.g., Wikipedia links) by balancing
parens after extraction instead of excluding ) from the match pattern.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Handle URLs in BibTeX files where closing } was being included in the URL.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Example files are parse test cases, not documentation. Skip them to avoid
false positives from old example content. Keep checking .md files as they
may contain documentation with links worth validating.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add : to trailing punctuation cleanup to handle markdown like [text](url):
  This fixes the paren balancing for URLs followed by colons
- Expand example directory matching to catch variations like examples/,
  example/, examples-to-fix/, examples.errors/, etc.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@kaby76
Copy link
Contributor Author

kaby76 commented Feb 2, 2026

99 stale links detected:
script.output.txt

kaby76 and others added 5 commits February 2, 2026 05:49
- Add find-stale-link-grammars.sh script that checks only grammars
  changed between two git refs
- Add stale-links job to main.yml workflow to run on PRs
- Remove weekly stale-links.yml workflow (now done per-PR)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Some legitimate URLs like gnu.org can take up to 20 seconds to respond.
Increase the default timeout from 10s to 30s to be more forgiving.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
bash ../_scripts/find-stale-links.sh --replace
@kaby76 kaby76 marked this pull request as ready for review February 2, 2026 12:16
@kaby76 kaby76 marked this pull request as draft February 2, 2026 12:20
@kaby76
Copy link
Contributor Author

kaby76 commented Feb 2, 2026

Still getting timeouts with 30s wait on http links. May need to run the downloads in parallel so that it doesn't take too long with many links, and bump up to 1 minute wait.

kaby76 and others added 3 commits February 2, 2026 07:28
- Default timeout increased from 30s to 60s for slow CI environments
- Added --parallel N option (default: 10) for concurrent URL checks
- Restructured into 3 phases: collect URLs, check in parallel, process results

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Timeouts (HTTP 000) are now reported separately as warnings
- Only actual HTTP errors (404, 500, etc.) cause build failure
- Show HTTP status codes in reports for better debugging

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@kaby76 kaby76 marked this pull request as ready for review February 2, 2026 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant