feat(nomad): get `peer_end_count` by summing `peer_end_count` from run summaries inside of allocations by their run_id by veeso · Pull Request #581 · holochain/wind-tunnel

veeso · 2026-03-17T12:29:25Z

Summary

When running scenarios via Nomad, the peer_end_count in the generated run_summary.jsonl was always set to the configured peer_count, ignoring agents that may have dropped during the run. This PR fixes that by fetching each allocation's run_summary.jsonl via the Nomad API, reading the actual peer_end_count, and summing them per run_id.

Changes

Added fetch_peer_end_count function that reads peer_end_count from an allocation's run_summary.jsonl via nomad alloc fs, with a fallback to the configured peer_count if the file is unavailable.
Before generating the combined run summary, iterate over all allocations to build a peer_end_count_by_run map keyed by run_id.
Use the summed peer_end_count instead of hardcoding it to peer_count in the jq template.

closes #504

TODO:

All code changes are reflected in docs, including module-level docs
All new/edited/removed scenarios are reflected in summary visualiser tool (see checklist)
I ran the Nomad CI workflow successfully on my branch

Note that all commits in a PR must follow Conventional Commits before it can be merged, as these are used to generate the changelog

Summary by CodeRabbit

Chores
- Improved CI allocation data collection to aggregate final peer counts per run, updating exported run summaries to use aggregated values with fallbacks and emitting warnings when allocation-level data is missing, enhancing accuracy of run-level metrics.

coderabbitai · 2026-03-17T12:29:35Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8829acb7-466f-4cd2-9305-72b7f03e9fed

📥 Commits

Reviewing files that changed from the base of the PR and between 6de2fbe and f71d584.

📒 Files selected for processing (1)

nomad/scripts/ci_allocs.sh

🚧 Files skipped from review as they are similar to previous changes (1)

nomad/scripts/ci_allocs.sh

Walkthrough

The script nomad/scripts/ci_allocs.sh now fetches each allocation's alloc/run_summary.jsonl via nomad alloc fs, extracts per-allocation peer_end_count, aggregates these counts by run_id, and uses the aggregated value (falling back to configured peer_count) when generating run-summary JSONL.

Changes

Cohort / File(s)	Summary
Nomad CI allocation script `nomad/scripts/ci_allocs.sh`	Added `fetch_peer_end_count(alloc_id, fallback_peer_count)` which reads `alloc/run_summary.jsonl` via `nomad alloc fs` and jq, aggregating `.peer_end_count`. Modified `generate_run_summary` to build `peer_end_count_by_run` and use the aggregated per-run value (with fallback to `peer_count`) when emitting JSON objects.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

ci: parallelise the running of nomad jobs #272 — Changes how alloc IDs are produced/serialized; related because this PR reads per-allocation run_summary.jsonl and aggregates peer_end_count from those alloc IDs.

Suggested reviewers

cdunster
jost-s

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: fetching and summing peer_end_count from allocation run summaries by run_id instead of using the configured peer_count.
Description check	✅ Passed	The PR description covers the summary, specific changes made, and references the closed issue, though the TODO checklist items remain uncompleted.
Linked Issues check	✅ Passed	The code changes implement all primary objectives from issue `#504`: added fetch_peer_end_count function to retrieve and sum peer_end_count from allocations by run_id, and modified generate_run_summary to use aggregated values instead of peer_count.
Out of Scope Changes check	✅ Passed	All changes are scoped to the ci_allocs.sh script and directly address the objectives defined in issue `#504` with no unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/504-get-peer_end_count-after-running-nomad-scenarios

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-17T12:29:43Z

The following will be added to the changelog

[0.6.1] - 2026-03-19

Features

(nomad) Get peer_end_count by summing peer_end_count from run summaries inside of allocations by their run_id

claude · 2026-03-19T08:41:18Z

Claude finished @veeso's task in 50s —— View job

No issues found.

coderabbitai

🧹 Nitpick comments (2)

nomad/scripts/ci_allocs.sh (2)
77-77: Nitpick: Move local declaration outside the loop.

Declaring local inside the loop re-declares on each iteration. Moving it before the loop is cleaner.
✨ Suggested improvement
     declare -A peer_end_count_by_run
+    local this_alloc_peer_end_count
     while IFS=',' read -r _job_name _scenario_name alloc_id _run_id _started_at _duration _peer_count _behaviours; do
-        local this_alloc_peer_end_count
         this_alloc_peer_end_count=$(fetch_peer_end_count "$alloc_id" "$_peer_count")
         peer_end_count_by_run["$_run_id"]="$(( ${peer_end_count_by_run["$_run_id"]:-0} + this_alloc_peer_end_count ))"
     done < "$allocs_csv_file"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nomad/scripts/ci_allocs.sh` at line 77, Move the variable declaration for
this_alloc_peer_end_count out of the loop to avoid re-declaring it on each
iteration; declare it once as local before the loop that uses it (reference the
variable name this_alloc_peer_end_count and the enclosing loop in
nomad/scripts/ci_allocs.sh) and keep assignments/updates inside the loop.
150-162: Pipeline error handling may behave unexpectedly with set -eo pipefail.

With set -eo pipefail at the top of the script, the || result="" at the end of line 154 should catch failures, but the behavior can be subtle. If nomad succeeds but jq fails (e.g., malformed JSON), pipefail would cause the pipeline to return non-zero, and || result="" would then execute—this is likely the intended behavior.

However, there's a subtle case: if the file exists but is empty, jq --slurp 'last' returns null, and // 0 yields 0. This silently reports zero peers completed instead of triggering the fallback. Consider whether this is acceptable or if you want to treat 0 as a warning condition too.
💡 Optional: Add explicit check for zero result
 function fetch_peer_end_count() {
     local alloc_id="$1"
     local fallback_peer_count="$2"
     local result
     result=$(nomad alloc fs "$alloc_id" alloc/run_summary.jsonl 2>/dev/null | jq --slurp 'last | .peer_end_count // 0') || result=""
-    if [[ -z "$result" || "$result" == "null" ]]; then
+    if [[ -z "$result" || "$result" == "null" || "$result" == "0" ]]; then
         echo "Warning: could not fetch peer_end_count for alloc $alloc_id, falling back to configured peer_count ($fallback_peer_count)" >&2
         echo "$fallback_peer_count"
         return
     fi
     echo "Fetched peer_end_count for alloc $alloc_id: $result" >&2
     echo "$result"
 }
Alternatively, if 0 is a valid value (e.g., all agents failed), the current behavior is correct.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nomad/scripts/ci_allocs.sh` around lines 150 - 162, In fetch_peer_end_count,
handle the empty-file vs legitimate-zero ambiguity by first capturing nomad
output into a raw variable (e.g., raw_output) and checking if raw_output is
empty before running jq; if raw_output is empty or nomad/jq exit non-zero, log
the warning and echo the fallback_peer_count, otherwise run jq on raw_output and
if jq returns null/empty treat as fallback too, finally echo the numeric result;
update references inside fetch_peer_end_count (alloc_id, fallback_peer_count,
result) accordingly so the pipeline failure and empty-file cases both fall back
instead of silently returning 0.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@nomad/scripts/ci_allocs.sh`:
- Line 77: Move the variable declaration for this_alloc_peer_end_count out of
the loop to avoid re-declaring it on each iteration; declare it once as local
before the loop that uses it (reference the variable name
this_alloc_peer_end_count and the enclosing loop in nomad/scripts/ci_allocs.sh)
and keep assignments/updates inside the loop.
- Around line 150-162: In fetch_peer_end_count, handle the empty-file vs
legitimate-zero ambiguity by first capturing nomad output into a raw variable
(e.g., raw_output) and checking if raw_output is empty before running jq; if
raw_output is empty or nomad/jq exit non-zero, log the warning and echo the
fallback_peer_count, otherwise run jq on raw_output and if jq returns null/empty
treat as fallback too, finally echo the numeric result; update references inside
fetch_peer_end_count (alloc_id, fallback_peer_count, result) accordingly so the
pipeline failure and empty-file cases both fall back instead of silently
returning 0.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 53c51d9a-edf8-42e1-aa28-fa42ecbeb08e

📥 Commits

Reviewing files that changed from the base of the PR and between 46fe784 and 6de2fbe.

📒 Files selected for processing (1)

nomad/scripts/ci_allocs.sh

…n summaries inside of allocations by their run_id closes #504

cocogitto-bot · 2026-03-19T08:45:46Z

✔️ f71d584 - Conventional commits check succeeded.

ThetaSinner

Looks good

veeso force-pushed the feat/504-get-peer_end_count-after-running-nomad-scenarios branch 4 times, most recently from d5e0380 to 6de2fbe Compare March 18, 2026 17:00

veeso marked this pull request as ready for review March 19, 2026 08:41

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

feat(nomad): get peer_end_count by summing peer_end_count from ru…

f71d584

…n summaries inside of allocations by their run_id closes #504

veeso force-pushed the feat/504-get-peer_end_count-after-running-nomad-scenarios branch from 6de2fbe to f71d584 Compare March 19, 2026 08:45

veeso requested a review from a team March 19, 2026 08:47

ThetaSinner approved these changes Mar 19, 2026

View reviewed changes

veeso merged commit f45b7f6 into main Mar 19, 2026
30 of 58 checks passed

veeso deleted the feat/504-get-peer_end_count-after-running-nomad-scenarios branch March 19, 2026 16:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(nomad): get `peer_end_count` by summing `peer_end_count` from run summaries inside of allocations by their run_id#581

feat(nomad): get `peer_end_count` by summing `peer_end_count` from run summaries inside of allocations by their run_id#581
veeso merged 1 commit intomainfrom
feat/504-get-peer_end_count-after-running-nomad-scenarios

veeso commented Mar 17, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 17, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

cocogitto-bot bot commented Mar 19, 2026

Uh oh!

ThetaSinner left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

veeso commented Mar 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

TODO:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[0.6.1] - 2026-03-19

Features

Uh oh!

claude bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cocogitto-bot bot commented Mar 19, 2026

Uh oh!

ThetaSinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

veeso commented Mar 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 17, 2026 •

edited

Loading

github-actions bot commented Mar 17, 2026 •

edited

Loading

claude bot commented Mar 19, 2026 •

edited

Loading