Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 52 additions & 4 deletions .github/workflows/nomad.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -139,22 +139,70 @@ jobs:
NOMAD_VAR_holochain_bin_url: "${{ env.HOLOCHAIN_BIN_URL }}"
run: |-
set -euo pipefail
nomad_output=$(nix run --impure --inputs-from . nixpkgs#nomad -- job run nomad/jobs/${JOB_NAME}.nomad.hcl)

echo "Running Nomad job: ${JOB_NAME}"
if ! nomad_output=$(nix run --impure --inputs-from . nixpkgs#nomad -- job run nomad/jobs/${JOB_NAME}.nomad.hcl 2>&1); then
Copy link
Member

@mattyg mattyg Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought set -euo pipefail should already cause the command within $() to send its stderr to the shell's stderr which I thought github actions would print?

echo "ERROR: Failed to run Nomad job"
echo "Nomad command exit code: $?"
echo "Output:"
echo "$nomad_output"
exit 1
Comment on lines +144 to +149
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Exit code capture is incorrect.

Line 146 attempts to capture the Nomad command's exit code with $?, but this occurs after running echo "ERROR: Failed to run Nomad job" on line 145. The $? variable always holds the exit status of the last executed command, so it will be 0 (from the successful echo), not the actual Nomad failure code. This defeats the debugging purpose stated in the PR title "Debug exit code 2".

🔎 Proposed fix
           echo "Running Nomad job: ${JOB_NAME}"
           if ! nomad_output=$(nix run --impure --inputs-from . nixpkgs#nomad -- job run nomad/jobs/${JOB_NAME}.nomad.hcl 2>&1); then
+            nomad_exit_code=$?
             echo "ERROR: Failed to run Nomad job"
-            echo "Nomad command exit code: $?"
+            echo "Nomad command exit code: $nomad_exit_code"
             echo "Output:"
             echo "$nomad_output"
             exit 1
           fi
🤖 Prompt for AI Agents
In .github/workflows/nomad.yaml around lines 144 to 149 the script logs the
Nomad command exit code using `$?` after running echo, so it always reports 0;
capture the exit status immediately after the failing command and before any
other command runs (e.g., assign rc=$? right after the `nix run ...` failure
branch is entered), then use that saved variable when printing "Nomad command
exit code" and exit with that code (or a clear non-zero). Ensure no commands
between the failing command and the rc capture.

fi

echo "$nomad_output"
echo "Ran ${JOB_NAME} with run ID ${RUN_ID}" >> "$GITHUB_STEP_SUMMARY"
alloc_ids=$(echo "$nomad_output" | grep -oP --color=never 'Allocation "\K[0-9a-f]+(?=" created)' | paste -sd ' ' -)

echo "Extracting allocation IDs from Nomad output..."
if ! alloc_ids=$(echo "$nomad_output" | grep -oP --color=never 'Allocation "\K[0-9a-f]+(?=" created)' 2>&1); then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use grep -oE with extended regex instead of PCRE for portability.

The current implementation uses grep -oP with PCRE-specific features (\K and (?=...)). According to the project learning, extended regex with grep -oE should be used for better portability, and the pattern should capture allocation IDs including hyphens: [0-9a-f-]+.

Based on learnings, grep -oP should be replaced with grep -oE for better portability when extracting allocation IDs in wind-tunnel workflows.

🔎 Proposed fix for portability
           echo "Extracting allocation IDs from Nomad output..."
-          if ! alloc_ids=$(echo "$nomad_output" | grep -oP --color=never 'Allocation "\K[0-9a-f]+(?=" created)' 2>&1); then
+          if ! alloc_ids=$(echo "$nomad_output" | grep -oE --color=never 'Allocation "[0-9a-f-]+" created' | grep -oE '[0-9a-f-]+' | grep -v 'Allocation\|created' 2>&1); then
             grep_exit_code=$?

Alternatively, if the Nomad output format is predictable, you could use a simpler pattern:

           echo "Extracting allocation IDs from Nomad output..."
-          if ! alloc_ids=$(echo "$nomad_output" | grep -oP --color=never 'Allocation "\K[0-9a-f]+(?=" created)' 2>&1); then
+          if ! alloc_ids=$(echo "$nomad_output" | grep -oE 'Allocation "[0-9a-f-]+"' | cut -d'"' -f2 2>&1); then
             grep_exit_code=$?
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if ! alloc_ids=$(echo "$nomad_output" | grep -oP --color=never 'Allocation "\K[0-9a-f]+(?=" created)' 2>&1); then
if ! alloc_ids=$(echo "$nomad_output" | grep -oE 'Allocation "[0-9a-f-]+"' | cut -d'"' -f2 2>&1); then
grep_exit_code=$?
🤖 Prompt for AI Agents
In .github/workflows/nomad.yaml around line 156, replace the PCRE grep -oP usage
with a portable grep -oE solution: use an extended regex that allows hyphens in
IDs (e.g. match "Allocation \"[0-9a-f-]+\"" or just the quoted ID with
'"[0-9a-f-]+"'), then strip the surrounding quotes (via sed or tr) to produce
the raw allocation IDs; this removes PCRE-only constructs (\K and lookahead) and
ensures portability across grep implementations.

grep_exit_code=$?
echo "ERROR: grep command failed with exit code ${grep_exit_code}"
if [ $grep_exit_code -eq 1 ]; then
echo "No allocation IDs found in Nomad output (no matches)"
elif [ $grep_exit_code -eq 2 ]; then
echo "grep encountered an error (invalid regex or other issue)"
fi
echo "Full Nomad output for debugging:"
echo "--- START NOMAD OUTPUT ---"
echo "$nomad_output"
echo "--- END NOMAD OUTPUT ---"
exit 1
fi

if ! alloc_ids=$(echo "$alloc_ids" | paste -sd ' ' - 2>&1); then
echo "ERROR: Failed to format allocation IDs"
echo "paste command failed"
echo "Raw allocation IDs:"
echo "$alloc_ids"
exit 1
fi

if [ -z "$alloc_ids" ]; then
echo "Failed to extract allocation IDs from Nomad job output"
echo "ERROR: Extracted allocation IDs string is empty"
echo "Full Nomad output for debugging:"
echo "--- START NOMAD OUTPUT ---"
echo "$nomad_output"
echo "--- END NOMAD OUTPUT ---"
exit 1
fi

echo "Successfully extracted allocation IDs: $alloc_ids"

echo "Reading job duration from nomad/vars/${JOB_NAME}.json..."
if ! duration="$(jq -e -r '.duration' "nomad/vars/${JOB_NAME}.json" 2>&1)"; then
echo "ERROR: Failed to read duration from nomad/vars/${JOB_NAME}.json"
echo "jq output: $duration"
exit 1
fi

duration="$(jq -e -r '.duration' "nomad/vars/${JOB_NAME}.json")"
echo "Job duration: ${duration}s"

echo "alloc_ids=$alloc_ids" >> "$GITHUB_OUTPUT"
echo "started_at=$(date +%s)" >> "$GITHUB_OUTPUT"
echo "job_name=${JOB_NAME}" >> "$GITHUB_OUTPUT"
echo "duration=$duration" >> "$GITHUB_OUTPUT"

echo "Successfully configured Nomad job outputs"

- name: Save alloc_ids to file
run: |
started_at=${{ steps.run-nomad-job.outputs.started_at }}
Expand Down
Loading