Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions .github/workflows/compile-perf-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -229,8 +229,51 @@ jobs:
# job. trend.py exits non-zero (job red + ::error:: annotations) when a
# primary timer drifts past threshold vs the trailing median.
- name: Check trend (fail on regression)
id: trend
shell: bash

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Gap: New Slack step is unreachable in CI until the cron is re-enabled

The new step is gated if: github.event_name == 'schedule' && always(), but the schedule: block at lines 14-15 of this same workflow is intentionally commented out, and workflow_dispatch is explicitly excluded by the == 'schedule' check. Result: the entire Slack notification path — bash branch selection, inline python -c, urllib.urlopen call, payload JSON shape, secret-missing skip — has zero execution coverage in this PR and will continue to have zero coverage until someone uncomments the cron weeks or months later. A typo in the inline Python, an env-var name mismatch, or a payload-shape regression would surface only at that point, exactly when the nightly alert is supposed to start working.

Example: If os.environ["ICON"] were misspelled in the f-string, manual workflow_dispatch runs would never catch it; the first scheduled run would fail silently (job log shows a Python KeyError, but no Slack message arrives) and the on-call user would assume "no regressions" instead of "alerting broken".

Suggestion: Either (a) temporarily widen the gate to also fire on a tagged workflow_dispatch input so the path can be exercised end-to-end once before the cron is enabled — e.g. add a notify-slack boolean input and gate on (github.event_name == 'schedule' || github.event.inputs.notify-slack == 'true') && always(); or (b) extract the inline Python into tools/compile-perf/notify_slack.py with a small unit test asserting the JSON payload shape and the success/failure/else branch selection. Option (b) is preferable because it pins the contract without requiring a live run.

run: |
set -euo pipefail
cd "$GITHUB_WORKSPACE/tools/compile-perf"
python trend.py --results "$GITHUB_WORKSPACE/perf-results"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Gap: PR description names the wrong secret

The workflow reads secrets.SLACK_WEBHOOK_COMPILE_PERF, but the PR description tells the reader the secret is SLANG_PERF_SLACK_WEBHOOK. Whoever provisions the org secret based on the PR description will set the wrong name, and the new step will silently take the exit 0 "not set; skipping" branch on every scheduled run — so the cron will look green, but no notifications will ever fire. This is the same failure mode as #2 above, triggered through a different mechanism.

Suggestion: Update the PR description to name SLACK_WEBHOOK_COMPILE_PERF (the workflow is the source of truth), or rename the workflow's secrets.SLACK_WEBHOOK_COMPILE_PERF reference to match whichever convention is preferred. If the team has a naming convention (e.g. SLANG_* prefix for shader-slang org secrets), pick the conforming name and align both sides.


- name: Notify Slack
if: github.event_name == 'schedule' && always()
shell: bash
env:
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK_COMPILE_PERF }}
TREND_OUTCOME: ${{ steps.trend.outcome }}
RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
run: |
if [ -z "$SLACK_WEBHOOK" ]; then
echo "SLACK_WEBHOOK_COMPILE_PERF not set; skipping Slack notification"
exit 0
fi
DATE=$(date -u +%Y-%m-%d)
if [ "$TREND_OUTCOME" = "success" ]; then
ICON=":white_check_mark:"
STATUS="No regressions detected"
elif [ "$TREND_OUTCOME" = "failure" ]; then
ICON=":warning:"
STATUS="Regression detected — see CI run for details"
else
ICON=":x:"
STATUS="Nightly job failed — see CI run for details"
fi
python -c "
import json, os, urllib.request
payload = {
'text': (
f'{os.environ[\"ICON\"]} *Compile-perf nightly — {os.environ[\"DATE\"]}*\n'
f'{os.environ[\"STATUS\"]}\n'
Comment thread
jvepsalainen-nv marked this conversation as resolved.
Outdated
f'• Run: {os.environ[\"RUN_URL\"]}\n'
f'• Report: https://shader-slang.org/slang-compile-perf/ '
f'(may take a few minutes to update after the run)'
)
}
req = urllib.request.Request(
os.environ['SLACK_WEBHOOK'],
data=json.dumps(payload).encode(),
headers={'Content-Type': 'application/json'})
urllib.request.urlopen(req, timeout=10)
print('Slack notification sent')
"
2 changes: 1 addition & 1 deletion tools/compile-perf/DESIGN.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ via `analyze.canonical_runs`, so history and daily points compare like-with-like
After each nightly rebuild, `trend.py` compares the latest point's primary timers
(per workload, always including `compileInner`) against the trailing-N-point
median (default 7), restricted to the **same runner fingerprint**. A metric past
both a relative (`--rel`, default 1.25×) and absolute (`--abs`, default 2 ms)
both a relative (`--rel`, default 1.10×) and absolute (`--abs`, default 2 ms)
threshold is flagged — printed, emitted as a GitHub `::error::` annotation +
step-summary row, and the job exits non-zero (after the push, so the data is still
stored). If the latest point's runner differs from the history's, it warns and
Expand Down
12 changes: 7 additions & 5 deletions tools/compile-perf/trend.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
compares only against same-runner points.

python3 trend.py --results <perf-results> # after track.py rebuild
python3 trend.py --results <dir> --window 7 --rel 1.25 --abs 2.0
python3 trend.py --results <dir> --window 7 --rel 1.10 --abs 2.0
"""
import argparse
import json
Expand Down Expand Up @@ -58,9 +58,11 @@ def main():
formatter_class=argparse.RawDescriptionHelpFormatter)
ap.add_argument("--results", default=os.path.join(HERE, "results"))
# Threshold rationale:
# --rel 1.25: flag a 25% rise vs trailing median. The per-PR gate uses 15%;
# 25% here catches gradual drift that accumulates across many PRs without
# any single one tripping the per-PR gate.
# --rel 1.10: flag a 10% rise vs trailing median. The runner is a dedicated
# quiesced machine with 5-sample medians, giving a noise floor of ~1-3%,
# so 10% catches real medium regressions while avoiding noise false positives.
# The --abs guard (2 ms) prevents alerting on tiny absolute deltas even when
# the relative ratio exceeds 10%.
# --abs 2.0: ignore sub-2 ms absolute deltas regardless of ratio — a 50%
# rise in a 3 ms timer is within measurement noise, not a real regression.
# --window 7: trailing-7-point median spans ~one week of nightly runs,
Expand All @@ -69,7 +71,7 @@ def main():
# --min-baseline 3: require at least 3 prior same-runner points before judging,
# so the first few nights after a new runner don't produce false positives.
ap.add_argument("--window", type=int, default=7, help="trailing points for the median")
ap.add_argument("--rel", type=float, default=1.25, help="relative regression threshold")
ap.add_argument("--rel", type=float, default=1.10, help="relative regression threshold")
Comment thread
jvepsalainen-nv marked this conversation as resolved.
ap.add_argument("--abs", type=float, default=2.0, help="min absolute ms delta to flag")
ap.add_argument("--min-baseline", type=int, default=3,
help="min trailing points required to judge a metric")
Expand Down
Loading