compile-perf: Slack notification on scheduled runs + 10% regression threshold#11745
compile-perf: Slack notification on scheduled runs + 10% regression threshold#11745jvepsalainen-nv wants to merge 4 commits into
Conversation
On a dedicated quiesced runner with 5-sample medians the noise floor is ~1-3%, so 1.25x was too loose to catch medium regressions. 1.10x (10%) catches real regressions while the --abs 2ms guard prevents alerts on small absolute deltas.
…hreshold - trend.py: lower --rel default from 1.25 to 1.10 (10% threshold on quiesced runner with 5-sample medians; --abs 2ms still guards against small absolute deltas) - nightly workflow: add Slack notification step for scheduled runs (fires with if: always() so it reports both success and regression); uses SLANG_PERF_SLACK_WEBHOOK secret; includes CI run link and report link with caveat about Pages update delay
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe nightly compile-perf workflow now records the trend check result and posts a Slack notification on scheduled runs. The compile-perf drift threshold default changes from 1.25 to 1.10, and the example command and documentation are updated to match. ChangesCompile perf drift alerts
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9b4f552d-5643-4eaa-9ed5-2397068476de
📒 Files selected for processing (3)
.github/workflows/compile-perf-nightly.ymltools/compile-perf/DESIGN.mdtools/compile-perf/trend.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
.github/workflows/compile-perf-nightly.yml (1)
251-267: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winExport payload vars before invoking Python.
DATE,ICON, andSTATUSare shell locals, but the Python snippet reads them fromos.environ; withoutexport, this step can fail withKeyErrorand skip Slack notification.Suggested fix
- DATE=$(date -u +%Y-%m-%d) + export DATE=$(date -u +%Y-%m-%d) if [ "$TREND_OUTCOME" = "success" ]; then - ICON=":white_check_mark:" - STATUS="No regressions detected" + export ICON=":white_check_mark:" + export STATUS="No regressions detected" elif [ "$TREND_OUTCOME" = "failure" ]; then - ICON=":warning:" - STATUS="Regression detected — see CI run for details" + export ICON=":warning:" + export STATUS="Regression detected — see CI run for details" else - ICON=":x:" - STATUS="Nightly job failed — see CI run for details" + export ICON=":x:" + export STATUS="Nightly job failed — see CI run for details" fiAs per path instructions,
.github/workflows/**changes must verify secret handling and notification-path correctness.Source: Path instructions
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 814a1cef-79f2-4c58-bf79-94c4ade668b6
📒 Files selected for processing (1)
.github/workflows/compile-perf-nightly.yml
- Export ICON/STATUS/DATE shell vars so os.environ can read them in Python - Update analyze.py classify() docstring: step_thr references 1.10 not 1.25, drift_thr explains why it stays at 1.25 (release noise vs nightly noise)
There was a problem hiding this comment.
Verdict: 🟡 Has issues — 0 bug(s), 2 gap(s)
Tooling/CI-only PR: lowers trend.py --rel default 1.25 → 1.10 and adds a Slack notification step to the nightly compile-perf workflow. Threshold-change rationale comments are updated consistently across trend.py, analyze.py, and DESIGN.md. Two coverage gaps around the new Slack path: the if: excludes the only currently-enabled trigger so the path never runs in CI, and the PR description names a secret that doesn't match the workflow.
Changes Overview
Regression threshold tightening (tools/compile-perf/trend.py, tools/compile-perf/lib/analyze.py, tools/compile-perf/DESIGN.md)
- What changed:
trend.py's--reldefault drops from 1.25 (25%) to 1.10 (10%) with a new rationale citing a dedicated quiesced runner and ~1-3% noise floor.analyze.py'sclassify()keepsstep_thr=1.4anddrift_thr=1.25but the rationale comments are updated to explain why those release-history thresholds intentionally stay higher thantrend.py's new nightly default.DESIGN.mdupdates the matching sentence.
Slack notification on schedule runs (.github/workflows/compile-perf-nightly.yml)
- What changed: adds
id: trendto the existing "Check trend" step and appends a new "Notify Slack" step gated byif: github.event_name == 'schedule' && always(). The step exits early ifSLACK_WEBHOOK_COMPILE_PERFis unset, otherwise branches onsteps.trend.outcome(success/failure/else → ✅/⚠️ /❌) and POSTs a JSON payload with the run URL and report URL to the webhook via an inlinepython -curllib.request.urlopencall.
Findings (2 total)
| Severity | Location | Finding |
|---|---|---|
| 🟡 Gap | .github/workflows/compile-perf-nightly.yml:233 |
New Slack step is unreachable in CI until the disabled cron is re-enabled — workflow_dispatch runs are excluded by == 'schedule', so payload/branch logic gets zero coverage |
| 🟡 Gap | .github/workflows/compile-perf-nightly.yml:237 |
PR description names secret SLANG_PERF_SLACK_WEBHOOK; workflow uses SLACK_WEBHOOK_COMPILE_PERF — operator provisioning from the description will silently land in the "skipping" branch |
| # primary timer drifts past threshold vs the trailing median. | ||
| - name: Check trend (fail on regression) | ||
| id: trend | ||
| shell: bash |
There was a problem hiding this comment.
🟡 Gap: New Slack step is unreachable in CI until the cron is re-enabled
The new step is gated if: github.event_name == 'schedule' && always(), but the schedule: block at lines 14-15 of this same workflow is intentionally commented out, and workflow_dispatch is explicitly excluded by the == 'schedule' check. Result: the entire Slack notification path — bash branch selection, inline python -c, urllib.urlopen call, payload JSON shape, secret-missing skip — has zero execution coverage in this PR and will continue to have zero coverage until someone uncomments the cron weeks or months later. A typo in the inline Python, an env-var name mismatch, or a payload-shape regression would surface only at that point, exactly when the nightly alert is supposed to start working.
Example: If os.environ["ICON"] were misspelled in the f-string, manual workflow_dispatch runs would never catch it; the first scheduled run would fail silently (job log shows a Python KeyError, but no Slack message arrives) and the on-call user would assume "no regressions" instead of "alerting broken".
Suggestion: Either (a) temporarily widen the gate to also fire on a tagged workflow_dispatch input so the path can be exercised end-to-end once before the cron is enabled — e.g. add a notify-slack boolean input and gate on (github.event_name == 'schedule' || github.event.inputs.notify-slack == 'true') && always(); or (b) extract the inline Python into tools/compile-perf/notify_slack.py with a small unit test asserting the JSON payload shape and the success/failure/else branch selection. Option (b) is preferable because it pins the contract without requiring a live run.
| run: | | ||
| set -euo pipefail | ||
| cd "$GITHUB_WORKSPACE/tools/compile-perf" | ||
| python trend.py --results "$GITHUB_WORKSPACE/perf-results" |
There was a problem hiding this comment.
🟡 Gap: PR description names the wrong secret
The workflow reads secrets.SLACK_WEBHOOK_COMPILE_PERF, but the PR description tells the reader the secret is SLANG_PERF_SLACK_WEBHOOK. Whoever provisions the org secret based on the PR description will set the wrong name, and the new step will silently take the exit 0 "not set; skipping" branch on every scheduled run — so the cron will look green, but no notifications will ever fire. This is the same failure mode as #2 above, triggered through a different mechanism.
Suggestion: Update the PR description to name SLACK_WEBHOOK_COMPILE_PERF (the workflow is the source of truth), or rename the workflow's secrets.SLACK_WEBHOOK_COMPILE_PERF reference to match whichever convention is preferred. If the team has a naming convention (e.g. SLANG_* prefix for shader-slang org secrets), pick the conforming name and align both sides.
Two improvements to the nightly compile-perf monitoring.
Changes
tools/compile-perf/trend.pyLower
--reldefault from 1.25 (25%) to 1.10 (10%). The runner is a dedicated quiesced machine with 5-sample medians, giving a noise floor of ~1–3%, so 10% catches real medium regressions while avoiding false positives. The--abs 2msguard still prevents alerts on small absolute deltas..github/workflows/compile-perf-nightly.ymlid: trendto the Check trend step so its outcome is accessiblescheduleruns withif: always()— posts pass/fail status, CI run link, and report link to the channel configured by theSLANG_PERF_SLACK_WEBHOOKsecret (silently skips if secret not set)Note on in-flight PRs
This PR may conflict with #11698 (#11702, #11727 also touch the same file) at the trend step line. The conflict is trivial — just a
python→python3rename at the same location that can be resolved by whoever merges last.