CI analytics: derive hosted-runner cap from org plan tier (Team=60)#11895
CI analytics: derive hosted-runner cap from org plan tier (Team=60)#11895jkiviluoto-nv wants to merge 2 commits into
Conversation
The Slang org moved to GitHub Team, raising the GitHub-hosted-runner concurrency cap from the Free-tier 20 to 60. The analytics sampler hard-coded 20, so the health dashboard reported usage against a stale cap. Rather than bump the constant, query the cap dynamically from the org's plan tier so future plan changes are picked up with no code edit. The cap isn't exposed directly by any API, but it is a fixed function of the plan: `orgs/<org>.plan.name` maps through free=20, team=60, enterprise=180. `fetch_org_plan_cap` performs that lookup and returns None (never raises) on API error, a missing `plan` field (external/fork tokens lack org visibility), or an unrecognized tier; the caller then falls back to DEFAULT_HOSTED_RUNNER_CAP, now set to the Team value. sample_hosted_runner_usage(repo, cap=None) auto-detects when cap is None and still accepts an explicit override from --cap or tests. Downstream consumers already read the cap from the snapshot and use percentage thresholds, so they scale automatically.
📝 WalkthroughWalkthroughHosted-runner cap handling now auto-detects the org plan tier, preserves unknown caps instead of guessing, and updates CLI, chart, console, and HTML output to render known and unknown cap cases differently. ChangesHosted runner cap auto-detection
Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c27465de-6ba3-475a-a29d-474d09a13a2f
📒 Files selected for processing (3)
extras/ci/analytics/ci_health.pyextras/ci/analytics/ci_hosted_runner_usage.pyextras/ci/analytics/tests/test_ci_analytics.py
| from gh_api import gh_api, gh_api_list | ||
|
|
||
| DEFAULT_REPO = "shader-slang/slang" | ||
|
|
||
| # The Slang org runs on the standard public-repo concurrent-runner cap | ||
| # of 20 hosted runners shared across all labels. The cap is per-org, | ||
| # not per-label. | ||
| DEFAULT_HOSTED_RUNNER_CAP = 20 | ||
| # GitHub's standard concurrent-runner cap for GitHub-hosted runners, by | ||
| # plan tier. This is the total number of hosted runners an account can | ||
| # run at once across all labels; it is a per-account limit, not | ||
| # per-label and not per-repo. Values are GitHub's published standard | ||
| # limits (https://docs.github.com/actions/reference/usage-limits). | ||
| PLAN_TIER_HOSTED_RUNNER_CAP = { | ||
| "free": 20, | ||
| "team": 60, | ||
| "enterprise": 180, | ||
| } | ||
|
|
||
| # Fallback cap used only when the org plan cannot be queried. Set to the | ||
| # Team-tier value because the Slang org is on GitHub Team (60 concurrent | ||
| # hosted runners); see the plan map above. | ||
| DEFAULT_HOSTED_RUNNER_CAP = PLAN_TIER_HOSTED_RUNNER_CAP["team"] |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
python3 - <<'PY'
from pathlib import Path
p = Path('extras/ci/analytics/ci_hosted_runner_usage.py')
print(p.exists(), p)
if p.exists():
for i, line in enumerate(p.read_text().splitlines(), 1):
if 25 <= i <= 60:
print(f"{i:4}: {line}")
PYRepository: shader-slang/slang
Length of output: 1675
🌐 Web query:
GitHub Actions usage limits Enterprise Cloud standard runners 500 concurrent jobs 180 hosted runners official docs
💡 Result:
For GitHub Enterprise Cloud plans, the default job concurrency limit for standard GitHub-hosted runners is 500 total concurrent jobs [1][2]. Within this limit, there is a maximum of 50 concurrent jobs allowed for macOS runners [1][2]. These concurrency limits are subject to the following details: - Concurrency Increases: GitHub Support can increase these job concurrency limits upon request via a support ticket [1][2]. - Larger Runners: These limits apply to standard GitHub-hosted runners. Larger runners have different concurrency configurations, generally allowing up to 1,000 concurrent jobs (with specific limits per runner type and architecture, such as 50 concurrent jobs for macOS larger runners) [1][2]. Note that the maximum concurrent macOS jobs limit is shared across both standard GitHub-hosted runners and GitHub-hosted larger runners [1][2]. - Other Limits: Beyond job concurrency, other usage limits for GitHub Enterprise Cloud include 50,000 included minutes per month and 50 GB of artifact storage [1][3].
Citations:
- 1: https://docs.github.com/en/actions/reference/limits
- 2: https://github.com/github/docs/blob/main/content/actions/reference/limits.md
- 3: http://help.github.com/en/enterprise-cloud@latest/actions/hosting-your-own-runners/managing-self-hosted-runners/usage-limits-for-self-hosted-runners
🌐 Web query:
site:docs.github.com/actions/reference/usage-limits GitHub-hosted runners concurrent standard runners Enterprise Cloud 500 180
💡 Result:
For GitHub Enterprise Cloud, the standard GitHub-hosted runner concurrency limit is 500 concurrent jobs, with 50 concurrent macOS jobs. (docs.github.com)
If you meant larger runners, that’s a different limit/model. (docs.github.com)
Citations:
- 1: https://docs.github.com/en/actions/reference/limits?utm_source=openai
- 2: https://docs.github.com/en/actions/reference/limits?utm_source=openai
Update the Enterprise cap to 500.
extras/ci/analytics/ci_hosted_runner_usage.py:42-45 sets PLAN_TIER_HOSTED_RUNNER_CAP["enterprise"] = 180, but GitHub’s current limits page lists Enterprise Cloud standard GitHub-hosted runners at 500 concurrent jobs (50 macOS). This value feeds the reported headroom, so Enterprise orgs will be understated until it’s corrected.
… chart Two follow-ups to the dynamic-cap change: Unknown cap instead of a guessed fallback. When the org plan can't be queried (external/fork token, API error, unrecognized tier), resolve_hosted_runner_cap now returns None rather than substituting the Team value. A wrong denominator (reporting N/60 when the real cap might be 20) silently mis-scales every reading, so it's worse than none. The CLI summary drops the "/ cap (pct%)" and warns; the health page shows an UNKNOWN CAP banner with raw usage and an explanatory note. DEFAULT_ HOSTED_RUNNER_CAP is removed — PLAN_TIER_HOSTED_RUNNER_CAP is the single source of truth and "unknown" is now modeled as None end to end. Readable usage chart. A full-height dashed cap line at 60 forced the y-axis to auto-scale to the cap, crushing the real usage band (usually 10-25) into an unreadable sliver. Replace it with faint amber/red threshold zones (>=80% / >=100% of cap) painted by a beforeDraw plugin relative to the current y-scale: the axis now fits the data, and the zones only appear once usage climbs into the warn range. A subtitle states the cap and thresholds. When the cap is unknown, no zones or subtitle are drawn.
|
Follow-up commit 1. Unknown cap instead of a guessed fallback. When the org plan can't be queried (external/fork token, API error, unrecognized tier), the cap is now reported as unknown (
2. Readable usage chart. A full-height dashed cap line at 60 forced the y-axis to auto-scale to the cap, crushing the real usage band (typically ~10–25) into an unreadable sliver at the bottom. It's replaced with faint amber/red threshold zones (≥80% / ≥100% of cap) painted by a 100 analytics tests pass. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
extras/ci/analytics/ci_hosted_runner_usage.py (1)
236-257: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winStale "using fallback cap" warnings — no fallback exists anymore.
Both warnings in
fetch_org_plan_capclaim a fallback is being used, but the function immediatelyreturn Noneright after — per the PR's own stated goal (and the function's docstring at Lines 230-233), there is no fallback anymore; the cap becomes "unknown". These messages will mislead anyone reading stderr into thinking a guessed cap was substituted, when in factresolve_hosted_runner_cap/sample_hosted_runner_usagewill report the cap asNonedownstream.🐛 Proposed fix
data, err = gh_api(f"orgs/{org}") if err or not isinstance(data, dict): print( f"Warning: could not query plan for org {org}: " - f"{err or 'unexpected response'}; using fallback cap.", + f"{err or 'unexpected response'}; cap will be reported as unknown.", file=sys.stderr, ) return None plan = data.get("plan") tier = plan.get("name") if isinstance(plan, dict) else None if not tier: # No `plan` field means the token lacks org visibility. Don't warn # loudly — this is expected for external/fork tokens. return None cap = PLAN_TIER_HOSTED_RUNNER_CAP.get(tier.lower()) if cap is None: print( f"Warning: unrecognized GitHub plan tier {tier!r} for org " - f"{org}; using fallback cap.", + f"{org}; cap will be reported as unknown.", file=sys.stderr, ) return None
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ba33eaf4-e055-45d7-a374-27b68c8bc595
📒 Files selected for processing (3)
extras/ci/analytics/ci_health.pyextras/ci/analytics/ci_hosted_runner_usage.pyextras/ci/analytics/tests/test_ci_analytics.py
Motivation
The Slang org moved to GitHub Team, which raises the GitHub-hosted-runner concurrency cap from the Free-tier 20 to 60 (
gh api orgs/shader-slang --jq .plan→{"name":"team", ...}). The health-dashboard sampler (extras/ci/analytics/ci_hosted_runner_usage.py) hard-codedDEFAULT_HOSTED_RUNNER_CAP = 20, so every "hosted runners in use" reading, theAT CAP/HIGHbanners, and the stacked usage chart were computed against a stale cap — a run at 30/60 (healthy, 50%) would have shown as 30/20 (150%, false alarm).Proposed solution
Rather than bump the constant to 60 (which goes stale again on the next plan change), query the cap dynamically from the org's plan tier. The concurrency cap isn't exposed directly by any API, but it is a fixed function of the plan, so
orgs/<org>.plan.namemaps cleanly:fetch_org_plan_cap(org)performs the lookup and returnsNone(never raises) on any API error, a missingplanfield (external/fork tokens lack org visibility), or an unrecognized tier.resolve_hosted_runner_cap(repo)prefers the live value and falls back toDEFAULT_HOSTED_RUNNER_CAP, which is now set to the Team value (60) — the org's actual plan — so even the fallback is correct today.sample_hosted_runner_usage(repo, cap=None)auto-detects whencapisNone; an explicit integer (from--capor a test) still overrides. Downstream consumers (ci_health.pyrendering, chart building, CLI summary) already read the cap out of the snapshot and use percentage-based thresholds (warn ≥80%, alarm at cap), so they scale to the new value with no further edits.Change summary
extras/ci/analytics/ci_hosted_runner_usage.pyPLAN_TIER_HOSTED_RUNNER_CAPmap,org_from_repo,fetch_org_plan_cap,resolve_hosted_runner_cap;sample_hosted_runner_usage(cap=None)auto-detects;DEFAULT_HOSTED_RUNNER_CAP→ 60 (Team);--capdefault →None(auto-detect)extras/ci/analytics/ci_health.py(plan not queryable, using fallback)when it lands on the fallback capextras/ci/analytics/tests/test_ci_analytics.pyTestHostedRunnerCapResolution(8 tests): org parse, all three tiers + case-insensitivity, no-plan / API-error / unknown-tier fallbacks, auto-detect path, fallback == 60Verification
cap = 60, no stderr warnings; real usage rendered15 / 60 (25%).--cap 20override andpy_compile/pyflakesclean.