compile-perf: fix Windows runner — add bash to PATH, use python3#11698
compile-perf: fix Windows runner — add bash to PATH, use python3#11698jvepsalainen-nv wants to merge 20 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughTwo GitHub Actions workflows are updated to use explicit Python 3 setup via ChangesCompile Perf Workflow and Tool Updates
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The release sweep uses shell: bash steps but doesn't go through common-setup (no build needed), so Git Bash wasn't on PATH. Also replace bare 'python' with 'python3' throughout both workflows — Windows doesn't have a 'python' alias by default.
070e28d to
00bf550
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a2e0bc20-ad08-4142-a3fc-ccb3707b0ecc
📒 Files selected for processing (2)
.github/workflows/compile-perf-nightly.yml.github/workflows/compile-perf-release-sweep.yml
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 4
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a2e0bc20-ad08-4142-a3fc-ccb3707b0ecc
📒 Files selected for processing (2)
.github/workflows/compile-perf-nightly.yml.github/workflows/compile-perf-release-sweep.yml
🛑 Comments failed to post (4)
.github/workflows/compile-perf-nightly.yml (4)
18-20: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Default blank
reftomaster, not the dispatch branch.Line 19 promises “blank = master HEAD”, but Line 50 uses
github.ref, which is the branch/tag selected when manually dispatching the workflow. That can record non-master runs as nightly ToT data.🐛 Proposed fix
- uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5 with: - ref: ${{ github.event.inputs.ref || github.ref }} + ref: ${{ github.event.inputs.ref || 'master' }} persist-credentials: falseAs per path instructions, CI/CD workflow changes must verify correct trigger conditions.
Also applies to: 48-51
Source: Path instructions
28-30: 🩺 Stability & Availability | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash # Description: Inspect compile-perf workflows that write the perf-results repo and compare concurrency groups. # Expected: workflows that push to shader-slang/slang-compile-perf share one non-canceling concurrency group. set -euo pipefail for f in .github/workflows/compile-perf-nightly.yml .github/workflows/compile-perf-release-sweep.yml; do [ -f "$f" ] || continue echo "### $f" sed -n '/^concurrency:/,/^[^[:space:]]/p' "$f" | sed '$d' || true rg -n 'PERF_RESULTS_REPO|slang-compile-perf|git push' "$f" || true echo doneRepository: shader-slang/slang
Length of output: 1633
Use the same concurrency group for both
compile-perf-nightlyandcompile-perf-release-sweepworkflows.Both workflows push to the same
shader-slang/slang-compile-perfrepository and gh-pages branch, but they use separate concurrency groups (compile-perf-nightlyandcompile-perf-release-sweep). This allows them to run concurrently, risking non-fast-forward push failures when both attempt simultaneous pushes. Consolidate them under a single shared concurrency group withcancel-in-progress: falseto serialize perf-results writes.
44-46: 📐 Maintainability & Code Quality | 🔴 Critical
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash # Description: Check whether the custom nvrgfx-perf runner label is configured for actionlint. # Expected: actionlint configuration includes nvrgfx-perf, or the workflow label matches an existing configured custom label. set -euo pipefail echo "## actionlint-related files" fd -i 'actionlint|\.github' . -t f | sed -n '1,120p' echo echo "## nvrgfx-perf references" rg -n 'nvrgfx-perf|runner-label|self-hosted' .github -g '*.yml' -g '*.yaml' || trueRepository: shader-slang/slang
Length of output: 3295
🏁 Script executed:
cat -n .github/actionlint.yamlRepository: shader-slang/slang
Length of output: 408
🏁 Script executed:
sed -n '40,50p' .github/workflows/compile-perf-nightly.ymlRepository: shader-slang/slang
Length of output: 567
Add
nvrgfx-perfto the actionlint configuration or update the runner label.The
nvrgfx-perflabel at line 46 is not configured in.github/actionlint.yaml. If actionlint is enforced in CI, this workflow will fail linting. Either addnvrgfx-perfto theself-hosted-runner.labelslist in.github/actionlint.yaml, or use an existing configured label likeperf.🧰 Tools
🪛 actionlint (1.7.12)
[error] 46-46: label "nvrgfx-perf" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2025-vs2026", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xlarge", "macos-latest-large", "macos-26-intel", "macos-26-xlarge", "macos-26-large", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xlarge", "macos-14-large", "macos-14", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows", "arc", "benchmark", "build", "falcor", "GCP-T4", "GPU", "perf", "regression-test", "SM80Plus", "vulkancts". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
Source: Linters/SAST tools
155-179: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Don’t deploy Pages after a failed report generation.
Because report generation is
continue-on-error, later deploy steps still run. Ifreport.pyleaves a partialanalysisdirectory before failing, Lines 209-217 can publish incomplete reports.🛡️ Proposed fix
- name: Generate HTML reports + id: generate_html_reports continue-on-error: true shell: bash run: | set -euo pipefail @@ - name: Checkout gh-pages branch + if: steps.generate_html_reports.outcome == 'success' continue-on-error: true uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5 with: @@ - name: Deploy HTML reports to GitHub Pages + if: steps.generate_html_reports.outcome == 'success' continue-on-error: true shell: bash env:Also applies to: 209-217
🧰 Tools
🪛 zizmor (1.26.1)
[warning] 169-176: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false
(artipacked)
|
Heads-up: this PR was evicted from the merge queue by an infra flake — a single 600s per-test timeout in Falcor |
There was a problem hiding this comment.
Verdict: 🟡 Has issues — 0 bugs, 2 gaps
The PR fixes two real Windows-runner regressions (bash not on PATH on the release-sweep workflow; python vs python3 interpreter mismatch) and replaces the destructive cp releases/index.json "$RESULTS/index.json" with a track.py merge-index subcommand that preserves the historical release set. The remaining concerns are a stale doc reference outside the touched-file set and the absence of any automated test for the new merge_index logic.
Changes Overview
Windows runner fixes (.github/workflows/compile-perf-nightly.yml, .github/workflows/compile-perf-release-sweep.yml)
- Adds
actions/setup-python@v6(pinned to SHAa309ff8…, same pin already used elsewhere in the repo). - Release-sweep adds a
pwshstep that appendsC:\Program Files\Git\binandGit\usr\bintoGITHUB_PATH(matches the wording in.github/actions/common-setup/action.yml, which nightly already inherits viacommon-setup). - All inline
python …invocations are switched topython3 …, including the$(python3 … track.py runner-id)command substitution inside the release-sweep commit-message line. - Release-sweep gets a new
if: always()cleanup step thatrm -rf "$GITHUB_WORKSPACE/tools/compile-perf/releases".
track.py merge-index and index merge semantics (.github/workflows/compile-perf-release-sweep.yml, tools/compile-perf/track.py)
- The destructive
cp releases/index.json "$RESULTS/index.json"line is replaced withpython3 track.py merge-index --results "$RESULTS" --index releases/index.json. - The new
merge_indexfunction reads any existingindex.json, keys both inputs bytag(new wins on collision), sorts merged entries bydate, and writes the result.register,rebuild, andstamp-runnercontinue to run afterwards. - All JSON file writes in
track.pygainencoding="utf-8".
Drop diagnostics_errors workload (tools/compile-perf/bench.py, lib/manifest.py, lib/workloads.py, README.md)
- The intentionally-failing
diagnostics_errorsworkload (and itsexpect_fail/expected_failplumbing onWorkloadSpecand the per-sample result record) is removed;slangcwas aborting before emitting timer output for it, so it contributed no perf data. gen_diagnostics_clean's docstring is rewritten; the README row and theerrors − cleancaveat are dropped.
Encoding fixes for Windows HTML/SVG generation (tools/compile-perf/breakdown.py, tools/compile-perf/report.py)
- Every
open(…, "w")writing HTML/SVG and everyopen(…)reading SVG content gainsencoding="utf-8". Fixes theUnicodeEncodeErrorproduced by Windows Python'scp1252default on glyphs like→and×, and the×mojibake when SVG output was re-read. - Per-workload page back link changes from
../report_per_workload.htmlto../index.html, matching the deploy step that renames the file.
Findings (2 total)
| Severity | Location | Finding |
|---|---|---|
| 🟡 Gap | tools/compile-perf/lib/analyze.py:12 |
Module docstring still claims it "derives the diagnostics path-cost series (errors - clean)"; with diagnostics_errors removed there is nothing to subtract. File is not in the PR's touched-file set. |
| 🟡 Gap | tools/compile-perf/track.py:166-186 (new merge_index) |
New non-trivial merge logic has no automated test; missing-tag raises KeyError, missing-date silently sorts to the front, and several JSON loads use bare open() without with. |
| @@ -10,7 +10,6 @@ | |||
| - Every compilable workload exposes a ``[shader("compute")]`` entry named | |||
| ``computeMain`` so ``slangc`` auto-discovers it; results are written to a | |||
There was a problem hiding this comment.
🟡 Gap: Stale diagnostics_errors references remain in lib/analyze.py and other docstrings the PR did not touch
The PR drops the diagnostics_errors workload from manifest.py and the gen_diagnostics_errors generator from workloads.py, and updates the README table. But tools/compile-perf/lib/analyze.py is not in the touched-file set, and its module docstring still claims:
"""Stack per-release perf results into time-series and flag regressions.
...
Also derives the diagnostics path-cost series (errors - clean).
"""
(lib/analyze.py:12)
After this PR there is no errors series to subtract, so the docstring describes a behavior that the module no longer (and no longer can) implement. This is the same class of stale-doc issue the PR is fixing in workloads.py and the README — it just missed this file.
Suggestion: drop line 12 of lib/analyze.py, since no remaining function computes errors − clean. While there, grep once more for diagnostics_errors, expect_fail, expected_fail, and errors - clean / errors − clean across tools/compile-perf/ to make sure nothing else points at the removed workload — e.g. trailing references in sweep.py, trend.py, or fixture data.
|
|
||
|
|
||
| def stamp_runner(results_dir, label): | ||
| rp = os.path.join(results_dir, "runner.json") |
There was a problem hiding this comment.
🟡 Gap: merge_index is new non-trivial logic with no automated test, and several edge cases are silently ambiguous
def merge_index(results_dir, new_index_path):
...
dest = os.path.join(results_dir, "index.json")
existing = {}
if os.path.exists(dest):
for r in json.load(open(dest)):
existing[r["tag"]] = r
n_before = len(existing)
for r in json.load(open(new_index_path)):
existing[r["tag"]] = r
merged = sorted(existing.values(), key=lambda r: r.get("date", ""))merge_index replaces the previous cp releases/index.json "$RESULTS/index.json" line and is now the single source of truth for what releases the report covers. There is no test_*.py anywhere under tools/compile-perf/ (Glob tools/compile-perf/**/test_*.py returns zero results) and no CI step that exercises the function with anything other than the real production input on the self-hosted runner — i.e. the only signal that the merge worked is that the next release-sweep nightly produced a non-broken report.
Behaviors that are not pinned by any test:
- Missing
tagfield.existing[r["tag"]] = rraisesKeyErrorfor an entry withouttag. Cold-start with a freshly fetchedreleases/index.jsonof unknown shape will abort the whole sweep job mid-way through commit/push rather than fail early. - Missing
datefield.key=lambda r: r.get("date", "")silently sorts dateless entries to the front of the series, which then propagates into the tracking series and the HTML report — a bug an automated test would catch instantly but a human eyeballing the live report likely will not. - Duplicate tags within a single input. Last-wins, but this is undocumented.
- Mixed-shape merge (existing index from a previous tool version vs new index). No schema check.
- Empty/malformed JSON in either file. Bubbles up as a raw
JSONDecodeError; the runner is left with the cleanup step racing the half-writtenindex.json.
Coupled with the fact that json.load(open(dest)) and json.load(open(new_index_path)) don't use with, a KeyError mid-iteration leaves the read handle on dest open while the cleanup paths run, which on Windows can interact badly with the subsequent merged → open(dest, "w") in a retry / re-run scenario.
Suggestion: add a minimal tools/compile-perf/tests/test_track_merge_index.py (pytest, stdlib-only) covering at least: (a) cold-start (no dest file); (b) merge overrides by tag; (c) sort by date; (d) missing-tag and missing-date raise or have a documented fallback. Wire it into a pull_request paths-filtered job (paths: [tools/compile-perf/**, .github/workflows/compile-perf-*.yml]) so changes in this directory get the test run on ubuntu-latest before they reach the self-hosted runner. While there, wrap the two json.load(open(...)) calls in with blocks for symmetry with the new with open(dest, "w", encoding="utf-8") in the same function.
Fixes several issues discovered when running the compile-perf workflows on the nvrgfx-perf Windows self-hosted runner after the initial merge of #11485.
Changes
Release sweep workflow (
compile-perf-release-sweep.yml)Add bash to PATH (Windows)step — the release sweep doesn't usecommon-setup(no build needed), so Git Bash wasn't on PATH forshell: bashstepsSet up Pythonstep — Python is not on PATH by default on the self-hosted runnercp releases/index.json "$RESULTS/index.json"withtrack.py merge-index— merges newly swept releases into the existing index rather than overwriting it, so HTML reports are generated from all historical data regardless of the sweep date windowNightly workflow (
compile-perf-nightly.yml)Set up Pythonstep — Python is not on PATH by default on the self-hosted runnerpython3instead ofpythonthroughout —pythonis not available on the runnerbench.pydiagnostics_errorsworkload — slangc aborts before emitting timer output when compilation fails with many errors; the workload contributed no performance dataexpect_failplumbing — no workload setsexpect_fail=Trueafterdiagnostics_errorsis removedreport.py/breakdown.pyencoding="utf-8"— Windows Python defaults tocp1252which can't encode Unicode characters (e.g.→,×) used in the generated HTML, causingUnicodeEncodeErrorencoding="utf-8"— fixes garbled×rendering of×on the generated pagesbreakdown.pythat were missingencoding="utf-8"(workload.htmlfiles andbreakdown.svg)../report_per_workload.html→../index.htmllib/manifest.pydiagnostics_errorsWorkloadSpec entry andexpect_failfieldlib/workloads.pygen_diagnostics_errorsgenerator; updategen_diagnostics_cleandocstringtools/compile-perf/README.mddiagnostics_cleandescription — no longer a "control forerrors − clean" sincediagnostics_errorsis gonetrack.pymerge-indexcommand: merges a newindex.jsoninto the existing results repo index by tag, preserving all prior release entriesencoding="utf-8"to all JSON file writesTest plan
×,·) render correctly on the generated pages