Skip to content

Conversation

@aireilly
Copy link
Contributor

@aireilly aireilly commented Jul 30, 2025

Description

Adding link checker scripts + GitHub CI yml

Example CI failure: https://github.com/opendatahub-io/opendatahub-documentation/actions/runs/19084090396/job/54520156179

Summary by CodeRabbit

  • New Features

    • Added automated link validation for documentation. Modified AsciiDoc files are now checked for broken links in the continuous integration pipeline.
  • Chores

    • Implemented GitHub Actions workflow to validate links on push and pull requests. Configuration file added to exclude known or expected invalid links from validation checks.

@coderabbitai
Copy link

coderabbitai bot commented Jul 30, 2025

Walkthrough

A link validation system is introduced for AsciiDoc documentation. A GitHub Actions workflow triggers on push and pull requests to automatically check HTTP/HTTPS links in modified documentation files. Two new scripts work together: one identifies modified AsciiDoc files and determines affected documents through include dependencies, and another validates URLs with configurable ignore patterns using parallel HTTP checks.

Changes

Cohort / File(s) Summary
Link validation scripts
scripts/check-links.sh, scripts/links.ignore
New script to parse Asciidoctor output, extract HTTP/HTTPS URLs, apply ignore patterns via regex, and perform parallel curl-based URL validation (up to 10 concurrent). New ignore patterns file containing 16 regex patterns to exclude specific domains and URLs from validation.
Modified file detection
scripts/check-modified.sh
New script to identify modified .adoc files from Git history, categorize them as modules, assemblies, or books, and trace include dependencies to determine which books require link checking. Orchestrates link validation workflow by calling check-links.sh for each affected book.
Workflow automation
.github/workflows/check-links.yml
New GitHub Actions workflow triggered on push and pull_request to install asciidoctor, make scripts executable, and run the modified file detection and link checking pipeline on ubuntu-latest.

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions
    participant CMD as check-modified.sh
    participant CHK as check-links.sh
    participant curl as curl/HTTP
    participant report as Report

    GHA->>GHA: Install asciidoctor
    GHA->>GHA: Make scripts executable
    GHA->>CMD: Execute check-modified.sh
    
    CMD->>CMD: Get modified .adoc files (HEAD~1..HEAD)
    CMD->>CMD: Categorize: modules, assemblies, books
    CMD->>CMD: Trace includes to find affected books
    
    loop For each affected book
        CMD->>CHK: Execute check-links.sh <book>
        CHK->>CHK: Render with asciidoctor
        CHK->>CHK: Extract URLs (HTTP/HTTPS)
        CHK->>CHK: Apply ignore patterns
        CHK->>CHK: Deduplicate URLs
        
        rect rgb(200, 220, 255)
        note over CHK,curl: Parallel URL checks (up to 10 concurrent)
        CHK->>curl: Check URL status (parallel)
        curl-->>CHK: HTTP response
        end
        
        CHK->>CHK: Track broken links in temp file
        CHK->>report: Report invalid statuses (red highlight)
        CHK-->>CMD: Exit code 1 if broken links found
    end
    
    CMD->>report: Final status report
    CMD-->>GHA: Exit with failure if any checks failed
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • scripts/check-links.sh: Complex Bash script with URL extraction logic, regex-based ignore pattern application, parallel curl invocation (xargs with -P flag), and error tracking requiring careful validation of parsing and HTTP status handling.
  • scripts/check-modified.sh: Moderate complexity involving Git history parsing, file categorization, include dependency tracing via grep/sed patterns, and orchestration logic that must correctly identify affected books.
  • Workflow integration: Straightforward GitHub Actions configuration, but correctness depends on proper script chaining and exit code propagation.
  • Key areas requiring attention:
    • Verify URL extraction regex correctly captures all link types in Asciidoctor output
    • Validate include pattern matching for transitive dependency detection (module → assembly → book)
    • Confirm parallel execution and error aggregation in check-links.sh
    • Ensure exit codes are properly propagated through the script chain

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main changes: adding link checker scripts and GitHub CI workflow for automated link validation.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
scripts/check-links.sh (4)

15-21: Argument parsing blocks legitimate use-cases

The script only accepts either the --global flag or exactly one file argument.
It becomes impossible to check several specific files at once, or to pass additional flags (for example a future --verbose). This limitation will quickly surface in CI when multiple .adoc files change in the same commit outside assemblies/.

-elif [ "$#" -eq 1 ]; then
-    INPUT_FILE="$1"
+elif [ "$#" -ge 1 ]; then
+    INPUT_FILE=("$@")   # allow an arbitrary list

28-37: Ignore list is hard-wired & will drift

All patterns are baked into the script; updating them requires committing code instead of tweaking CI variables.
Consider letting callers extend / override the list via an environment variable, e.g. LINKCHECK_IGNORE_REGEX, and splitting on |. That keeps the script generic and reusable across repos.


45-47: Only the last trailing punctuation character is stripped

${URL%[.,;:?!\]\)]} removes one character.
Strings such as https://foo.com). will be truncated to https://foo.com) and still fail.

-URL=${URL%[.,;:?!\]\)]}
+while [[ "$URL" =~ [.,;:?!\]\)]$ ]]; do
+  URL=${URL%?}
+done

83-86: Temporary file never cleaned up

mktemp creates a real file that stays in /tmp.
Trap EXIT to delete it:

+trap 'rm -f "$TMP_FILE"' EXIT
.github/workflows/check-links.yml (2)

8-19: Global job lacks cache & matrix – runtime will grow

Running the checker over all assemblies on every PR can take several minutes as the doc set grows.
Consider:

  • Caching Asciidoctor’s gem downloads (actions/cache)
  • Using a matrix to shard the file list and run checks in parallel

40-41: Pattern excludes moved or renamed files

The regex ^assemblies/[^/]+\.adoc$ skips files moved to sub-directories or renamed within nested paths.
If that was unintentional, drop the anchor or use **/*.adoc.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a915636 and 0766a32.

📒 Files selected for processing (2)
  • .github/workflows/check-links.yml (1 hunks)
  • scripts/check-links.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: bredamc
PR: opendatahub-io/opendatahub-documentation#860
File: modules/using-llm-as-a-judge-metrics-with-lmeval.adoc:62-65
Timestamp: 2025-07-16T09:05:50.422Z
Learning: In the opendatahub-documentation project, bredamc prefers comprehensive error detection in code reviews. Continue to point out potential errors whether they are related to the code changes in the PR or not, but clearly distinguish between errors introduced in the PR versus errors in existing code.
Learnt from: bredamc
PR: opendatahub-io/opendatahub-documentation#842
File: modules/enabling-the-multi-model-serving-platform.adoc:11-17
Timestamp: 2025-07-07T19:36:45.326Z
Learning: In the opendatahub-documentation project, AsciiDoc anchors like `ref-dashboard-configuration-options_dashboard` are automatically generated when module files are included in assembly files, rather than being manually defined with [[anchor]] syntax. Links to these anchors will not 404 because the anchor is created through the inclusion mechanism.
scripts/check-links.sh (1)

Learnt from: bredamc
PR: #842
File: modules/enabling-the-multi-model-serving-platform.adoc:11-17
Timestamp: 2025-07-07T19:36:45.326Z
Learning: In the opendatahub-documentation project, AsciiDoc anchors like ref-dashboard-configuration-options_dashboard are automatically generated when module files are included in assembly files, rather than being manually defined with [[anchor]] syntax. Links to these anchors will not 404 because the anchor is created through the inclusion mechanism.

🔇 Additional comments (2)
scripts/check-links.sh (1)

70-71: Regex misses many valid URLs

[a-zA-Z0-9./?=%_-]* excludes &, #, :, ~, etc.
Consider using a more permissive pattern or letting pandoc/asciidoctor emit links in XML/JSON and parse that.

.github/workflows/check-links.yml (1)

36-45: origin/${{ github.base_ref }} may not exist in shallow clones

Even with fetch-depth: 0, the remote ref is named refs/remotes/origin/<branch>.
Inside containers, Git sometimes flattens remotes ⇒ the diff can be empty and the job silently passes.

A more robust variant:

BASE=${{ github.event.pull_request.base.sha }}
git diff --name-only $BASE...HEAD

@aireilly aireilly force-pushed the add-link-checker branch 16 times, most recently from abb49f9 to 5ee707f Compare July 31, 2025 14:18
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
scripts/check-links.sh (2)

70-72: Past feedback still unaddressed – $0 in xargs & race

Prior review pointed out that relying on $0 inside bash -c is brittle and that xargs can pass the argument explicitly. The issue remains.

-        xargs -P 10 -n 1 bash -c "$PATTERNS_DECL; check_url \"\$0\""
+        xargs -P 10 -n 1 -I{} bash -c "$PATTERNS_DECL; check_url \"{}\""

This also sidesteps $0 quirks and makes the call self-documenting.


25-29: Single shared temp file + echo > is racy under parallel curl

Parallel subshells write to the same TMP_FILE; simultaneous truncation (> redirection) can clobber the flag and hide failures.

-echo "0" > "$TMP_FILE"
+printf '0' >"$TMP_FILE"

And inside check_url (Line 58)

-        echo "1" > "$TMP_FILE"
+        printf '1' >>"$TMP_FILE"

For complete safety allocate one file per process (mktemp --tmpdir check_links.XXXXXX) and test existence rather than contents.

🧹 Nitpick comments (2)
.github/workflows/check-links.yml (1)

13-17: Apt installs on every run – consider pre-built Asciidoctor action

apt-get update && apt-get install adds ~30 s to every CI run. Switching to the maintained asciidoctor/docker-action or a pre-built container cuts that to near-zero and avoids future package-availability failures.

scripts/check-links.sh (1)

3-4: Header comment is now misleading

The script no longer “Checks for 404 links” only. It validates all non-2xx/3xx statuses (except 000/403). Please update the comment to avoid confusion.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a21d915 and 5ee707f.

📒 Files selected for processing (3)
  • .github/workflows/check-links.yml (1 hunks)
  • scripts/check-links.sh (1 hunks)
  • scripts/links.ignore (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • scripts/links.ignore
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: bredamc
PR: opendatahub-io/opendatahub-documentation#860
File: modules/using-llm-as-a-judge-metrics-with-lmeval.adoc:62-65
Timestamp: 2025-07-16T09:05:50.422Z
Learning: In the opendatahub-documentation project, bredamc prefers comprehensive error detection in code reviews. Continue to point out potential errors whether they are related to the code changes in the PR or not, but clearly distinguish between errors introduced in the PR versus errors in existing code.
📚 Learning: in asciidoc assembly files, the conditional directive syntax `ifdef::parent-context[:context: {paren...
Learnt from: jiridanek
PR: opendatahub-io/opendatahub-documentation#801
File: assemblies/managing-basic-workbenches.adoc:30-31
Timestamp: 2025-06-17T10:51:51.206Z
Learning: In AsciiDoc assembly files, the conditional directive syntax `ifdef::parent-context[:context: {parent-context}]` and `ifndef::parent-context[:!context:]` is valid and commonly used for managing document context when files are included in different assemblies. This pattern should not be flagged as malformed syntax.

Applied to files:

  • scripts/check-links.sh
🔇 Additional comments (1)
scripts/check-links.sh (1)

54-57: Why are 403 & 000 silently accepted?

Treating all non-2xx/3xx statuses as broken is safer; many sites respond with 403 when public access is blocked (broken link for readers). Please document the rationale or remove the exception.

-if [[ "$STATUS" != "000" && "$STATUS" != "403" && ! "$STATUS" =~ ^(2|3)[0-9]{2}$ ]]; then
+if [[ ! "$STATUS" =~ ^(2|3)[0-9]{2}$ ]]; then

@aireilly aireilly force-pushed the add-link-checker branch 7 times, most recently from 77015ea to 95f7e2b Compare July 31, 2025 15:58
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
.github/workflows/check-links.yml (1)

20-24: Minor: avoid chmod when scripts are tracked as executable

check-links.sh and check-modified.sh already have the shebang and can be committed with the executable bit (git add --chmod=+x).
Dropping this step reduces noise.

[nitpick]

scripts/check-modified.sh (1)

44-47: Deduplicate list to avoid redundant link checks

When a module appears in multiple books, the current loop checks each book repeatedly.

-# Check links in the compiled list of books
-for f in "${UPDATED_BOOKS[@]}"; do
-  ./scripts/check-links.sh "$f"
-done
+# Check links once per unique book
+for f in $(printf '%s\n' "${UPDATED_BOOKS[@]}" | sort -u); do
+  ./scripts/check-links.sh "$f"
+done
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd6ea05 and 3cef4f1.

📒 Files selected for processing (6)
  • .github/workflows/check-links.yml (1 hunks)
  • assemblies/upgrading-odh-v2.adoc (1 hunks)
  • customizing-models-with-lab-tuning.adoc (1 hunks)
  • scripts/check-links.sh (1 hunks)
  • scripts/check-modified.sh (1 hunks)
  • scripts/links.ignore (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • customizing-models-with-lab-tuning.adoc
🚧 Files skipped from review as they are similar to previous changes (2)
  • scripts/links.ignore
  • scripts/check-links.sh
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: bredamc
PR: opendatahub-io/opendatahub-documentation#860
File: modules/using-llm-as-a-judge-metrics-with-lmeval.adoc:62-65
Timestamp: 2025-07-16T09:05:50.422Z
Learning: In the opendatahub-documentation project, bredamc prefers comprehensive error detection in code reviews. Continue to point out potential errors whether they are related to the code changes in the PR or not, but clearly distinguish between errors introduced in the PR versus errors in existing code.
Learnt from: bredamc
PR: opendatahub-io/opendatahub-documentation#842
File: modules/enabling-the-multi-model-serving-platform.adoc:11-17
Timestamp: 2025-07-07T19:36:45.326Z
Learning: In the opendatahub-documentation project, AsciiDoc anchors like `ref-dashboard-configuration-options_dashboard` are automatically generated when module files are included in assembly files, rather than being manually defined with [[anchor]] syntax. Links to these anchors will not 404 because the anchor is created through the inclusion mechanism.
📚 Learning: in asciidoc assembly files, the conditional directive syntax `ifdef::parent-context[:context: {paren...
Learnt from: jiridanek
PR: opendatahub-io/opendatahub-documentation#801
File: assemblies/managing-basic-workbenches.adoc:30-31
Timestamp: 2025-06-17T10:51:51.206Z
Learning: In AsciiDoc assembly files, the conditional directive syntax `ifdef::parent-context[:context: {parent-context}]` and `ifndef::parent-context[:!context:]` is valid and commonly used for managing document context when files are included in different assemblies. This pattern should not be flagged as malformed syntax.

Applied to files:

  • .github/workflows/check-links.yml
  • assemblies/upgrading-odh-v2.adoc
📚 Learning: in the opendatahub-documentation project, asciidoc anchors like `ref-dashboard-configuration-options...
Learnt from: bredamc
PR: opendatahub-io/opendatahub-documentation#842
File: modules/enabling-the-multi-model-serving-platform.adoc:11-17
Timestamp: 2025-07-07T19:36:45.326Z
Learning: In the opendatahub-documentation project, AsciiDoc anchors like `ref-dashboard-configuration-options_dashboard` are automatically generated when module files are included in assembly files, rather than being manually defined with [[anchor]] syntax. Links to these anchors will not 404 because the anchor is created through the inclusion mechanism.

Applied to files:

  • assemblies/upgrading-odh-v2.adoc
  • scripts/check-modified.sh
📚 Learning: in asciidoc files for opendatahub-documentation, when adding explanatory text after list items, avoi...
Learnt from: bredamc
PR: opendatahub-io/opendatahub-documentation#816
File: modules/deploying-models-on-the-single-model-serving-platform.adoc:0-0
Timestamp: 2025-06-16T11:11:20.681Z
Learning: In AsciiDoc files for opendatahub-documentation, when adding explanatory text after list items, avoid using block continuation markers ('+') if the text should appear in the same paragraph as the list item. The block continuation marker would move the text to a new paragraph within the same step, which may not be the desired rendering.

Applied to files:

  • assemblies/upgrading-odh-v2.adoc
📚 Learning: in the opendatahub-documentation project, avoid suggesting unnecessary admonition blocks (note, warn...
Learnt from: bredamc
PR: opendatahub-io/opendatahub-documentation#860
File: modules/using-llm-as-a-judge-metrics-with-lmeval.adoc:62-65
Timestamp: 2025-07-16T09:10:15.185Z
Learning: In the opendatahub-documentation project, avoid suggesting unnecessary admonition blocks (NOTE, WARNING, IMPORTANT, etc.) for explanatory text that can be effectively communicated inline. The documentation team prefers cleaner, less cluttered documentation with inline text unless the information is critical enough to warrant special formatting emphasis.

Applied to files:

  • assemblies/upgrading-odh-v2.adoc
📚 Learning: in asciidoc documentation for opendatahub-documentation, when adding explanatory text after a list i...
Learnt from: bredamc
PR: opendatahub-io/opendatahub-documentation#816
File: modules/deploying-models-on-the-single-model-serving-platform.adoc:0-0
Timestamp: 2025-06-16T11:11:27.675Z
Learning: In AsciiDoc documentation for opendatahub-documentation, when adding explanatory text after a list item, the preferred rendering is to keep the text in the same paragraph as the list item (no `+` continuation character) rather than creating a separate paragraph within the same step using the `+` block continuation character.

Applied to files:

  • assemblies/upgrading-odh-v2.adoc
📚 Learning: in the opendatahub-documentation repository, some lines using asciidoc comment syntax (`//`) are del...
Learnt from: bredamc
PR: opendatahub-io/opendatahub-documentation#816
File: modules/deploying-teacher-and-judge-models.adoc:0-0
Timestamp: 2025-06-16T11:06:25.386Z
Learning: In the opendatahub-documentation repository, some lines using AsciiDoc comment syntax (`//`) are deliberately commented out and should not be converted to visible NOTE blocks or other visible content. The author intentionally keeps certain guidance as non-rendered comments.

Applied to files:

  • assemblies/upgrading-odh-v2.adoc

@aireilly aireilly force-pushed the add-link-checker branch 3 times, most recently from aab0ece to 9b975db Compare November 4, 2025 19:37
@aireilly
Copy link
Contributor Author

aireilly commented Nov 4, 2025

@aduquett This simplifed upstream link checker is ready. PTAL

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
scripts/check-links.sh (1)

44-44: Fix malformed glob pattern in bash parameter expansion.

The pattern [.,;:?!\]\)]} fails in Bash 5.1+ because the unescaped ] closes the bracket expression prematurely, leaving \)] as literal characters outside the set. This causes a "bad pattern" error.

To trim trailing punctuation correctly, use the POSIX character class or fix the bracket escape order:

-    URL=${URL%[.,;:?!\]\)]}
+    URL=${URL%[.,;:?!:)}\]}

Or prefer the more robust approach:

-    URL=${URL%[.,;:?!\]\)]}
+    # Trim a single trailing punctuation char if present
+    URL="${URL%[[:punct:]]}"
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a013036 and 83020f2.

📒 Files selected for processing (4)
  • .github/workflows/check-links.yml (1 hunks)
  • scripts/check-links.sh (1 hunks)
  • scripts/check-modified.sh (1 hunks)
  • scripts/links.ignore (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • scripts/links.ignore
  • scripts/check-modified.sh
  • .github/workflows/check-links.yml
🔇 Additional comments (1)
scripts/check-links.sh (1)

52-58: Clarify intent: why allow HTTP 000 and 403?

Line 54 accepts three status categories: 000 (timeout/connection error), 403 (Forbidden), and 2xx/3xx (success/redirect). Accepting 000 seems counterintuitive—it typically indicates a curl failure or timeout rather than a valid link. Accepting 403 may be deliberate (sites blocking bots), but both warrant documentation.

If 000 should actually flag as an error, change the condition to:

-    if [[ "$STATUS" != "000" && "$STATUS" != "403" && ! "$STATUS" =~ ^(2|3)[0-9]{2}$ ]]; then
+    if [[ "$STATUS" != "403" && ! "$STATUS" =~ ^(2|3)[0-9]{2}$ ]]; then

If both are intentional, add a comment explaining the rationale.

@aireilly aireilly force-pushed the add-link-checker branch 2 times, most recently from f8998c1 to b07ce1a Compare November 4, 2025 21:36
@aireilly aireilly closed this Nov 4, 2025
@aireilly aireilly reopened this Nov 4, 2025
@aireilly aireilly force-pushed the add-link-checker branch 8 times, most recently from 48e0d40 to b437806 Compare November 4, 2025 22:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
scripts/check-modified.sh (1)

1-4: Add Bash safety switches for robust execution.

The script lacks standard Bash error-handling switches. Add these at the top to catch errors early and handle filenames with spaces safely.

-#!/bin/bash
-#
-# Checks for 404 links in a compiled list of modified books
+#!/usr/bin/env bash
+set -Eeuo pipefail
+IFS=$'\n\t'
+#
+# Checks for 404 links in a compiled list of modified books
🧹 Nitpick comments (2)
scripts/check-links.sh (2)

27-27: Remove unused variable FOUND_ERRORS.

Line 27 declares FOUND_ERRORS=0 but it's never used; the script correctly tracks errors by writing to $TMP_FILE instead. Remove this unused variable.

 TMP_FILE=$(mktemp)
-FOUND_ERRORS=0

64-77: Add trap to ensure temporary file cleanup on script exit.

The temporary file created at line 26 is never explicitly deleted. While not critical for short-lived CI runs, it's good practice to clean up resources. Add a trap to ensure cleanup even if the script exits early due to errors.

+trap 'rm -f "$TMP_FILE"' EXIT
+
 run_url_checks() {
     local FILE="$1"
     echo -e "\n\033[32mChecking: $FILE\033[0m"
     asciidoctor "$FILE" -a doctype=book -o - | \
         grep -Eo '(http|https)://[a-zA-Z0-9./?=%_-]*' | \
         sort -u | \
         xargs -P 10 -I{} bash -c "$PATTERNS_DECL; check_url '{}' || echo '1' >> '$TMP_FILE'"
 }
 
 run_url_checks "$INPUT_FILE"
 
 if [ -s "$TMP_FILE" ]; then
     exit 1
 fi
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83020f2 and b437806.

📒 Files selected for processing (5)
  • .github/workflows/check-links.yml (1 hunks)
  • check-links.yml (1 hunks)
  • scripts/check-links.sh (1 hunks)
  • scripts/check-modified.sh (1 hunks)
  • scripts/links.ignore (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • check-links.yml
  • scripts/links.ignore
🧰 Additional context used
🪛 Shellcheck (0.11.0)
scripts/check-links.sh

[warning] 27-27: FOUND_ERRORS appears unused. Verify use (or export if used externally).

(SC2034)

🔇 Additional comments (1)
.github/workflows/check-links.yml (1)

1-28: Workflow file appears clean, but depends on scripts to handle complex logic.

The workflow itself is straightforward and delegates the main logic to the supporting scripts. However, verify that scripts/check-modified.sh correctly handles:

  1. Modified .adoc files in nested directories under assemblies/
  2. Correct git reference comparison on forked PRs

The issues flagged in previous reviews were in the supporting scripts, not in this workflow file.

@aireilly aireilly force-pushed the add-link-checker branch 2 times, most recently from 791b9f3 to bea595a Compare November 5, 2025 14:09
smccarthy-ie
smccarthy-ie previously approved these changes Nov 5, 2025
@aireilly aireilly force-pushed the add-link-checker branch 3 times, most recently from 3f8fd54 to c62ca49 Compare November 5, 2025 14:18
Signed-off-by: Aidan Reilly <[email protected]>

Signed-off-by: Aidan Reilly <[email protected]>

Signed-off-by: Aidan Reilly <[email protected]>

Signed-off-by: Aidan Reilly <[email protected]>

Signed-off-by: Aidan Reilly <[email protected]>

Signed-off-by: Aidan Reilly <[email protected]>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
scripts/check-links.sh (1)

27-27: Remove unused variable FOUND_ERRORS.

FOUND_ERRORS is declared but never referenced. This flags as SC2034 in shellcheck.

 # Create temp file for flagging broken links
 TMP_FILE=$(mktemp)
-FOUND_ERRORS=0
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b437806 and 8952032.

📒 Files selected for processing (4)
  • .github/workflows/check-links.yml (1 hunks)
  • scripts/check-links.sh (1 hunks)
  • scripts/check-modified.sh (1 hunks)
  • scripts/links.ignore (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • .github/workflows/check-links.yml
  • scripts/check-modified.sh
  • scripts/links.ignore
🧰 Additional context used
🪛 Shellcheck (0.11.0)
scripts/check-links.sh

[warning] 27-27: FOUND_ERRORS appears unused. Verify use (or export if used externally).

(SC2034)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Check links in modified AsciiDoc files
🔇 Additional comments (3)
scripts/check-links.sh (3)

40-59: URL validation logic is sound.

The function correctly handles pattern matching, timeout/retry logic, and HTTP status validation.


70-70: Concurrent writes to shared TMP_FILE could be more robust.

Up to 10 parallel xargs workers append to the same TMP_FILE, which can have atomicity concerns in high-concurrency scenarios, even with append operations. While the current logic only checks file non-emptiness (making it fairly safe in practice), this was flagged as a major concern in previous reviews.

For enhanced robustness, consider using per-worker atomic markers (unique temp files per process or process-based filenames) instead of a shared file. However, if this approach is acceptable for the documentation use case, it can remain as-is.

Did you intend to address the concurrent-write race condition flagged in previous reviews, or is the current append-based approach acceptable for this workflow's scale and reliability requirements?


73-77: Main flow is correct.

The script properly aggregates errors and exits with appropriate status codes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants