Skip to content

CI analytics: derive hosted-runner cap from org plan tier (Team=60)#11895

Open
jkiviluoto-nv wants to merge 2 commits into
shader-slang:masterfrom
jkiviluoto-nv:analytics-dynamic-runner-cap
Open

CI analytics: derive hosted-runner cap from org plan tier (Team=60)#11895
jkiviluoto-nv wants to merge 2 commits into
shader-slang:masterfrom
jkiviluoto-nv:analytics-dynamic-runner-cap

Conversation

@jkiviluoto-nv

Copy link
Copy Markdown
Contributor

Motivation

The Slang org moved to GitHub Team, which raises the GitHub-hosted-runner concurrency cap from the Free-tier 20 to 60 (gh api orgs/shader-slang --jq .plan{"name":"team", ...}). The health-dashboard sampler (extras/ci/analytics/ci_hosted_runner_usage.py) hard-coded DEFAULT_HOSTED_RUNNER_CAP = 20, so every "hosted runners in use" reading, the AT CAP/HIGH banners, and the stacked usage chart were computed against a stale cap — a run at 30/60 (healthy, 50%) would have shown as 30/20 (150%, false alarm).

Proposed solution

Rather than bump the constant to 60 (which goes stale again on the next plan change), query the cap dynamically from the org's plan tier. The concurrency cap isn't exposed directly by any API, but it is a fixed function of the plan, so orgs/<org>.plan.name maps cleanly:

Plan tier Hosted-runner concurrency cap
free 20
team 60
enterprise 180

fetch_org_plan_cap(org) performs the lookup and returns None (never raises) on any API error, a missing plan field (external/fork tokens lack org visibility), or an unrecognized tier. resolve_hosted_runner_cap(repo) prefers the live value and falls back to DEFAULT_HOSTED_RUNNER_CAP, which is now set to the Team value (60) — the org's actual plan — so even the fallback is correct today.

sample_hosted_runner_usage(repo, cap=None) auto-detects when cap is None; an explicit integer (from --cap or a test) still overrides. Downstream consumers (ci_health.py rendering, chart building, CLI summary) already read the cap out of the snapshot and use percentage-based thresholds (warn ≥80%, alarm at cap), so they scale to the new value with no further edits.

Change summary

File Change
extras/ci/analytics/ci_hosted_runner_usage.py Add PLAN_TIER_HOSTED_RUNNER_CAP map, org_from_repo, fetch_org_plan_cap, resolve_hosted_runner_cap; sample_hosted_runner_usage(cap=None) auto-detects; DEFAULT_HOSTED_RUNNER_CAP → 60 (Team); --cap default → None (auto-detect)
extras/ci/analytics/ci_health.py Health run notes (plan not queryable, using fallback) when it lands on the fallback cap
extras/ci/analytics/tests/test_ci_analytics.py New TestHostedRunnerCapResolution (8 tests): org parse, all three tiers + case-insensitivity, no-plan / API-error / unknown-tier fallbacks, auto-detect path, fallback == 60

Verification

  • Full analytics suite: 96 passed.
  • Live run auto-detected cap = 60, no stderr warnings; real usage rendered 15 / 60 (25%).
  • --cap 20 override and py_compile / pyflakes clean.

The Slang org moved to GitHub Team, raising the GitHub-hosted-runner
concurrency cap from the Free-tier 20 to 60. The analytics sampler
hard-coded 20, so the health dashboard reported usage against a stale
cap.

Rather than bump the constant, query the cap dynamically from the org's
plan tier so future plan changes are picked up with no code edit. The
cap isn't exposed directly by any API, but it is a fixed function of the
plan: `orgs/<org>.plan.name` maps through free=20, team=60,
enterprise=180. `fetch_org_plan_cap` performs that lookup and returns
None (never raises) on API error, a missing `plan` field (external/fork
tokens lack org visibility), or an unrecognized tier; the caller then
falls back to DEFAULT_HOSTED_RUNNER_CAP, now set to the Team value.

sample_hosted_runner_usage(repo, cap=None) auto-detects when cap is None
and still accepts an explicit override from --cap or tests. Downstream
consumers already read the cap from the snapshot and use percentage
thresholds, so they scale automatically.
@jkiviluoto-nv jkiviluoto-nv requested a review from a team as a code owner July 2, 2026 05:30
@jkiviluoto-nv jkiviluoto-nv requested review from bmillsNV and removed request for a team July 2, 2026 05:30
@jkiviluoto-nv jkiviluoto-nv added the pr: non-breaking PRs without breaking changes label Jul 2, 2026
@jhelferty-nv jhelferty-nv removed the request for review from bmillsNV July 2, 2026 05:30
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Hosted-runner cap handling now auto-detects the org plan tier, preserves unknown caps instead of guessing, and updates CLI, chart, console, and HTML output to render known and unknown cap cases differently.

Changes

Hosted runner cap auto-detection

Layer / File(s) Summary
Plan-tier cap mapping and constants
extras/ci/analytics/ci_hosted_runner_usage.py
Adds plan-tier cap mapping, removes the fixed default cap constant, and updates docs and CLI usage text.
Cap resolution and sampler wiring
extras/ci/analytics/ci_hosted_runner_usage.py
Adds org parsing and GitHub plan lookup helpers, and makes sampler cap resolution optional with cap=None preserved when undetectable.
Health rendering and console output
extras/ci/analytics/ci_health.py
Updates chart, tooltip, banner, summary, and console output to handle known-cap and unknown-cap states separately.
Cap resolution tests
extras/ci/analytics/tests/test_ci_analytics.py
Adds tests for plan-tier mapping, fallback behavior, unknown-cap rendering, and chart data propagation.

Suggested reviewers: bmillsNV, jvepsalainen-nv

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: deriving the hosted-runner cap from the org plan tier.
Description check ✅ Passed The description directly matches the changeset and explains the dynamic hosted-runner cap update.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c27465de-6ba3-475a-a29d-474d09a13a2f

📥 Commits

Reviewing files that changed from the base of the PR and between e0e029d and 678fa3b.

📒 Files selected for processing (3)
  • extras/ci/analytics/ci_health.py
  • extras/ci/analytics/ci_hosted_runner_usage.py
  • extras/ci/analytics/tests/test_ci_analytics.py

Comment thread extras/ci/analytics/ci_health.py Outdated
Comment on lines +33 to +51
from gh_api import gh_api, gh_api_list

DEFAULT_REPO = "shader-slang/slang"

# The Slang org runs on the standard public-repo concurrent-runner cap
# of 20 hosted runners shared across all labels. The cap is per-org,
# not per-label.
DEFAULT_HOSTED_RUNNER_CAP = 20
# GitHub's standard concurrent-runner cap for GitHub-hosted runners, by
# plan tier. This is the total number of hosted runners an account can
# run at once across all labels; it is a per-account limit, not
# per-label and not per-repo. Values are GitHub's published standard
# limits (https://docs.github.com/actions/reference/usage-limits).
PLAN_TIER_HOSTED_RUNNER_CAP = {
"free": 20,
"team": 60,
"enterprise": 180,
}

# Fallback cap used only when the org plan cannot be queried. Set to the
# Team-tier value because the Slang org is on GitHub Team (60 concurrent
# hosted runners); see the plan map above.
DEFAULT_HOSTED_RUNNER_CAP = PLAN_TIER_HOSTED_RUNNER_CAP["team"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

python3 - <<'PY'
from pathlib import Path
p = Path('extras/ci/analytics/ci_hosted_runner_usage.py')
print(p.exists(), p)
if p.exists():
    for i, line in enumerate(p.read_text().splitlines(), 1):
        if 25 <= i <= 60:
            print(f"{i:4}: {line}")
PY

Repository: shader-slang/slang

Length of output: 1675


🌐 Web query:

GitHub Actions usage limits Enterprise Cloud standard runners 500 concurrent jobs 180 hosted runners official docs

💡 Result:

For GitHub Enterprise Cloud plans, the default job concurrency limit for standard GitHub-hosted runners is 500 total concurrent jobs [1][2]. Within this limit, there is a maximum of 50 concurrent jobs allowed for macOS runners [1][2]. These concurrency limits are subject to the following details: - Concurrency Increases: GitHub Support can increase these job concurrency limits upon request via a support ticket [1][2]. - Larger Runners: These limits apply to standard GitHub-hosted runners. Larger runners have different concurrency configurations, generally allowing up to 1,000 concurrent jobs (with specific limits per runner type and architecture, such as 50 concurrent jobs for macOS larger runners) [1][2]. Note that the maximum concurrent macOS jobs limit is shared across both standard GitHub-hosted runners and GitHub-hosted larger runners [1][2]. - Other Limits: Beyond job concurrency, other usage limits for GitHub Enterprise Cloud include 50,000 included minutes per month and 50 GB of artifact storage [1][3].

Citations:


🌐 Web query:

site:docs.github.com/actions/reference/usage-limits GitHub-hosted runners concurrent standard runners Enterprise Cloud 500 180

💡 Result:

For GitHub Enterprise Cloud, the standard GitHub-hosted runner concurrency limit is 500 concurrent jobs, with 50 concurrent macOS jobs. (docs.github.com)

If you meant larger runners, that’s a different limit/model. (docs.github.com)

Citations:


Update the Enterprise cap to 500.
extras/ci/analytics/ci_hosted_runner_usage.py:42-45 sets PLAN_TIER_HOSTED_RUNNER_CAP["enterprise"] = 180, but GitHub’s current limits page lists Enterprise Cloud standard GitHub-hosted runners at 500 concurrent jobs (50 macOS). This value feeds the reported headroom, so Enterprise orgs will be understated until it’s corrected.

Comment thread extras/ci/analytics/ci_hosted_runner_usage.py
… chart

Two follow-ups to the dynamic-cap change:

Unknown cap instead of a guessed fallback. When the org plan can't be
queried (external/fork token, API error, unrecognized tier),
resolve_hosted_runner_cap now returns None rather than substituting the
Team value. A wrong denominator (reporting N/60 when the real cap might
be 20) silently mis-scales every reading, so it's worse than none. The
CLI summary drops the "/ cap (pct%)" and warns; the health page shows an
UNKNOWN CAP banner with raw usage and an explanatory note. DEFAULT_
HOSTED_RUNNER_CAP is removed — PLAN_TIER_HOSTED_RUNNER_CAP is the single
source of truth and "unknown" is now modeled as None end to end.

Readable usage chart. A full-height dashed cap line at 60 forced the
y-axis to auto-scale to the cap, crushing the real usage band (usually
10-25) into an unreadable sliver. Replace it with faint amber/red
threshold zones (>=80% / >=100% of cap) painted by a beforeDraw plugin
relative to the current y-scale: the axis now fits the data, and the
zones only appear once usage climbs into the warn range. A subtitle
states the cap and thresholds. When the cap is unknown, no zones or
subtitle are drawn.
@jkiviluoto-nv

Copy link
Copy Markdown
Contributor Author

Follow-up commit e5c44e0 addresses two points raised in review:

1. Unknown cap instead of a guessed fallback. When the org plan can't be queried (external/fork token, API error, unrecognized tier), the cap is now reported as unknown (None) end-to-end rather than falling back to a plausible-but-unverified number. A wrong denominator (N / 60 when the real cap might be 20) silently mis-scales every reading, so no denominator is safer than a wrong one:

  • CLI summary drops the / cap (pct%) and prints a warning.
  • Health page shows an UNKNOWN CAP banner with raw usage + an explanatory note.
  • DEFAULT_HOSTED_RUNNER_CAP is removed; PLAN_TIER_HOSTED_RUNNER_CAP is the single source of truth.

2. Readable usage chart. A full-height dashed cap line at 60 forced the y-axis to auto-scale to the cap, crushing the real usage band (typically ~10–25) into an unreadable sliver at the bottom. It's replaced with faint amber/red threshold zones (≥80% / ≥100% of cap) painted by a beforeDraw plugin relative to the current y-scale — so the axis fits the data and the zones only appear once usage climbs into the warn range. A subtitle states the cap and thresholds. No zones/subtitle when the cap is unknown.

100 analytics tests pass.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
extras/ci/analytics/ci_hosted_runner_usage.py (1)

236-257: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Stale "using fallback cap" warnings — no fallback exists anymore.

Both warnings in fetch_org_plan_cap claim a fallback is being used, but the function immediately return None right after — per the PR's own stated goal (and the function's docstring at Lines 230-233), there is no fallback anymore; the cap becomes "unknown". These messages will mislead anyone reading stderr into thinking a guessed cap was substituted, when in fact resolve_hosted_runner_cap/sample_hosted_runner_usage will report the cap as None downstream.

🐛 Proposed fix
     data, err = gh_api(f"orgs/{org}")
     if err or not isinstance(data, dict):
         print(
             f"Warning: could not query plan for org {org}: "
-            f"{err or 'unexpected response'}; using fallback cap.",
+            f"{err or 'unexpected response'}; cap will be reported as unknown.",
             file=sys.stderr,
         )
         return None
     plan = data.get("plan")
     tier = plan.get("name") if isinstance(plan, dict) else None
     if not tier:
         # No `plan` field means the token lacks org visibility. Don't warn
         # loudly — this is expected for external/fork tokens.
         return None
     cap = PLAN_TIER_HOSTED_RUNNER_CAP.get(tier.lower())
     if cap is None:
         print(
             f"Warning: unrecognized GitHub plan tier {tier!r} for org "
-            f"{org}; using fallback cap.",
+            f"{org}; cap will be reported as unknown.",
             file=sys.stderr,
         )
         return None

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ba33eaf4-e055-45d7-a374-27b68c8bc595

📥 Commits

Reviewing files that changed from the base of the PR and between 678fa3b and e5c44e0.

📒 Files selected for processing (3)
  • extras/ci/analytics/ci_health.py
  • extras/ci/analytics/ci_hosted_runner_usage.py
  • extras/ci/analytics/tests/test_ci_analytics.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr: non-breaking PRs without breaking changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant