Skip to content

Commit 46fb01d

Browse files
committed
Implement static threshold analysis tool
Creates tool to perform static threshold checks on benchmark results. If the check fails, the workflow will fail, indicating a regression.
1 parent 25a9ecc commit 46fb01d

File tree

15 files changed

+822
-164
lines changed

15 files changed

+822
-164
lines changed

.github/workflows/run-benchmarks.yaml

Lines changed: 34 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,14 @@ on:
2020
description: "The workflow type to run (e.g., PRESUBMIT)."
2121
required: true
2222
type: string
23+
ml_actions_ref:
24+
description: >
25+
The branch, tag, or SHA of google-ml-infra/actions to use. Defaults to 'main'.
26+
Note: For runs triggered from within the google-ml-infra/actions repo
27+
(e.g., a PR), the commit SHA (github.sha) is used automatically to test changes.
28+
required: false
29+
type: string
30+
default: 'main'
2331

2432
permissions:
2533
contents: read # Required for actions/checkout.
@@ -33,7 +41,7 @@ jobs:
3341
outputs:
3442
matrix: ${{ steps.generate.outputs.matrix }}
3543
steps:
36-
- name: Checkout user repo
44+
- name: Check out user repo
3745
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # ratchet:actions/checkout@v5
3846
with:
3947
repository: ${{ github.repository }}
@@ -45,7 +53,7 @@ jobs:
4553
env:
4654
USER_REPO: ${{ github.repository }}
4755
USER_REPO_SHA: ${{ github.sha }}
48-
WORKFLOW_REF: ${{ github.workflow_ref }}
56+
ML_ACTIONS_REF_INPUT: ${{ inputs.ml_actions_ref }}
4957
ML_ACTIONS_REPO: google-ml-infra/actions
5058
id: extract_ml_actions_repo_ref
5159
shell: bash
@@ -57,13 +65,13 @@ jobs:
5765
# Use SHA for ML Actions PRs for reproducibility.
5866
REPO_REF="$USER_REPO_SHA"
5967
else
60-
# For external repos, use ref (e.g. @v1).
61-
REPO_REF=$(echo "$WORKFLOW_REF" | cut -d'@' -f2)
68+
# For external repos, use provided ref.
69+
REPO_REF="$ML_ACTIONS_REF_INPUT"
6270
fi
63-
71+
6472
echo "repo_ref=$REPO_REF" >> "$GITHUB_OUTPUT"
6573
66-
- name: Checkout ML actions repo
74+
- name: Check out ML actions repo
6775
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # ratchet:actions/checkout@v5
6876
with:
6977
repository: 'google-ml-infra/actions'
@@ -103,7 +111,7 @@ jobs:
103111
image: ${{ matrix.container_image }}
104112

105113
steps:
106-
- name: Checkout user repo
114+
- name: Check out user repo
107115
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # ratchet:actions/checkout@v5
108116
with:
109117
repository: ${{ github.repository }}
@@ -115,7 +123,7 @@ jobs:
115123
env:
116124
USER_REPO: ${{ github.repository }}
117125
USER_REPO_SHA: ${{ github.sha }}
118-
WORKFLOW_REF: ${{ github.workflow_ref }}
126+
ML_ACTIONS_REF_INPUT: ${{ inputs.ml_actions_ref }}
119127
ML_ACTIONS_REPO: google-ml-infra/actions
120128
id: extract_ml_actions_repo_ref
121129
shell: bash
@@ -127,13 +135,13 @@ jobs:
127135
# Use SHA for ML Actions PRs for reproducibility.
128136
REPO_REF="$USER_REPO_SHA"
129137
else
130-
# For external repos, use ref (e.g. @v1).
131-
REPO_REF=$(echo "$WORKFLOW_REF" | cut -d'@' -f2)
138+
# For external repos, use provided ref.
139+
REPO_REF="$ML_ACTIONS_REF_INPUT"
132140
fi
133141
134142
echo "repo_ref=$REPO_REF" >> "$GITHUB_OUTPUT"
135143
136-
- name: Checkout ML actions repo
144+
- name: Check out ML actions repo
137145
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # ratchet:actions/checkout@v5
138146
with:
139147
repository: 'google-ml-infra/actions'
@@ -198,3 +206,18 @@ jobs:
198206
with:
199207
name: benchmark-result-${{ matrix.config_id }}
200208
path: ${{ steps.parse_tb_logs.outputs.artifact_path }}
209+
210+
- name: Run static threshold analyzer
211+
env:
212+
BENCHMARK_RESULT_FILE: ${{ steps.parse_tb_logs.outputs.artifact_path }}
213+
METRICS_MANIFEST_JSON: '${{ toJson(matrix.metrics) }}'
214+
id: static_threshold_analyzer
215+
shell: bash
216+
run: |
217+
set -euo pipefail
218+
ML_ACTIONS_REPO="$GITHUB_WORKSPACE/ml_actions"
219+
cd "$ML_ACTIONS_REPO" || exit 1
220+
221+
bazel run //benchmarking/static_threshold_analyzer -- \
222+
--metrics_manifest_json="$METRICS_MANIFEST_JSON" \
223+
--benchmark_result_file="$BENCHMARK_RESULT_FILE"

benchmarking/config/gha_runners.json

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,14 @@
77
{ "label": "linux-arm64-c4a-16", "os": "LINUX", "vcpu": 16 }
88
]
99
},
10+
"google-ml-infra/torch_tpu": {
11+
"CPU_X86": [
12+
{ "label": "linux-x86-n2-16", "os": "LINUX", "vcpu": 16 }
13+
],
14+
"CPU_ARM64": [
15+
{ "label": "linux-arm64-c4a-16", "os": "LINUX", "vcpu": 16 }
16+
]
17+
},
1018
"openxla/tokamax": {
1119
"CPU_X86": [
1220
{ "label": "linux-x86-n2-16", "os": "LINUX", "vcpu": 16 },

benchmarking/docs/onboarding.md

Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
# Onboarding Guide: ML Benchmarking
2+
3+
This guide provides the steps to add a project's benchmarks to the ML Benchmarking Infrastructure. ML Benchmarking is GitHub-native and makes use of GitHub Actions to administer benchmarks.
4+
5+
The system is designed to be agnostic to the benchmark workload, supporting any workload, such as Bazel targets or Python-based scripts.
6+
7+
The system follows two simple contracts:
8+
9+
1. Input: A benchmark_registry.pbtxt (a "manifest"), defining benchmark requirements.
10+
11+
2. Output: Benchmark scripts write raw metric data via TensorBoard.
12+
13+
Our infrastructure handles the following:
14+
15+
- Provisioning the correct GitHub Actions runners.
16+
- Converting defined benchmarks and hardware requirements into GitHub Actions jobs.
17+
- Workload build and dependency installation.
18+
- TensorBoard log parsing and statistic computation.
19+
- Static threshold analysis.
20+
21+
## Step 1: Create a workflow file
22+
23+
First, in your own repository, create a new workflow file in `.github/workflows/` for running benchmarks.
24+
25+
```yaml
26+
name: Run presubmit benchmarks
27+
28+
on:
29+
pull_request:
30+
paths:
31+
- 'benchmarking/**'
32+
33+
permissions:
34+
contents: read
35+
36+
jobs:
37+
run_benchmarks:
38+
uses: google-ml-infra/actions/.github/workflows/run_benchmarks.yml@<commit | branch | tag>
39+
with:
40+
registry_file: "benchmarking/my_registry.pbtxt"
41+
workflow_type: "PRESUBMIT"
42+
ml_actions_ref: <commit | branch | tag>
43+
```
44+
45+
### Required permissions
46+
47+
`permissions: contents: read` permission is required in the caller workflow. The reusable workflow's access token inherits permissions from the caller, so the caller must explicitly grant the read rights needed for actions/checkout to succeed.
48+
49+
### ml_actions_ref
50+
51+
You must specify the ml_actions_ref input, otherwise it will default to `main`.
52+
53+
This value tells the reusable workflow which version (branch, tag, or SHA) of the google-ml-infra/actions repository to check out its internal scripts (e.g., install_pip_deps.sh).
54+
55+
For production, use the same stable tag or SHA as used in the main `uses` line to pin the workflow file version.
56+
57+
### Workflow granularity
58+
We recommend creating a dedicated workflow file for each distinct [workflow_type](https://github.com/google-ml-infra/actions/blob/main/benchmarking/proto/benchmark_registry.proto#L112) you plan to support (PRESUBMIT, NIGHTLY, PERIODIC, etc.) to better control scheduling, triggers, and resource allocation.
59+
60+
61+
## Step 2: Create benchmark registry
62+
63+
Next, create a benchmark registry file (.pbtxt). It defines what benchmarks to run and how to run them, based on the [benchmark_registry.proto](https://github.com/google-ml-infra/actions/blob/main/benchmarking/proto/benchmark_registry.proto) schema.
64+
65+
A key part of the registry is defining metrics. You must specify the `metrics.name` field, which must exactly match the tag name used in the TensorBoard logs generated by your benchmark script (covered in the next step). Within the metrics block, you specify the statistics (stats) to be calculated (e.g., MEAN, P99) and can optionally configure static threshold analysis using the comparison block.
66+
67+
### Example 1: Bazel workload
68+
69+
```proto
70+
benchmarks {
71+
name: "my_bazel_benchmark"
72+
description: "Runs a simple Bazel target."
73+
owner: "my-team"
74+
75+
workload {
76+
bazel_workload {
77+
execution_target: "//my_project:my_benchmark_binary"
78+
}
79+
runtime_flags: "--model_name=resnet"
80+
}
81+
82+
hardware_configs {
83+
hardware_category: CPU_X86
84+
topology { num_hosts: 1, num_devices_per_host: 1 }
85+
workflow_type: [PRESUBMIT]
86+
87+
# Add hardware-specific runtime flags
88+
runtime_flags: "--precision=fp32"
89+
}
90+
91+
update_frequency_policy: QUARTERLY
92+
93+
metrics {
94+
# REQUIRED: Must match the TensorBoard tag name (e.g., 'wall_time' in the log)
95+
name: "wall_time"
96+
unit: "ms"
97+
98+
stats {
99+
stat: MEAN
100+
comparison: {
101+
# Configures static threshold analysis against a baseline
102+
baseline { value: 100.0 }
103+
threshold { value: 0.1 }
104+
improvement_direction: LESS
105+
}
106+
}
107+
108+
stats {
109+
stat: P99
110+
}
111+
}
112+
}
113+
```
114+
115+
### Example 2: Python workload
116+
117+
```proto
118+
benchmarks {
119+
name: "my_python_benchmark"
120+
description: "Runs a pip-based Python script."
121+
owner: "my-team"
122+
123+
workload {
124+
python_workload {
125+
script_path: "benchmarking/scripts/run_pallas.py"
126+
python_version: "3.11"
127+
128+
# The directory containing your pyproject.toml
129+
pip_project_path: "."
130+
131+
# Optional dependencies from your pyproject.toml [project.optional-dependencies]
132+
pip_optional_dependencies: "test"
133+
}
134+
runtime_flags: "--model_name=my_kernel"
135+
}
136+
137+
hardware_configs {
138+
hardware_category: GPU_L4
139+
topology { num_hosts: 1, num_devices_per_host: 1 }
140+
workflow_type: [PRESUBMIT]
141+
142+
# Add hardware-specific optional dependencies
143+
pip_optional_dependencies: "cuda"
144+
145+
# Add hardware-specific runtime flags
146+
runtime_flags: "--use_gpu"
147+
}
148+
149+
update_frequency_policy: QUARTERLY
150+
151+
metrics {
152+
# REQUIRED: Must match the TensorBoard tag name
153+
name: "throughput"
154+
unit: "samples/sec"
155+
156+
stats {
157+
stat: MEAN
158+
comparison: {
159+
baseline { value: 5000.0 }
160+
threshold { value: 0.05 }
161+
improvement_direction: GREATER
162+
}
163+
}
164+
165+
stats {
166+
stat: MIN
167+
}
168+
}
169+
}
170+
```
171+
172+
## Step 3: Log metrics via TensorBoard
173+
174+
Your benchmark script will need to log metrics via TensorBoard to integrate with the infrastructure.
175+
176+
The reusable workflow provides a standard environment variable, `TENSORBOARD_OUTPUT_DIR`, which points to the directory where TensorBoard must write event files.
177+
178+
Example script:
179+
180+
```python
181+
import tensorflow as tf
182+
import os
183+
import sys
184+
import numpy as np
185+
186+
# Get the output directory from the infrastructure.
187+
tblog_dir = os.environ.get("TENSORBOARD_OUTPUT_DIR")
188+
189+
if not tblog_dir:
190+
print("Error: TENSORBOARD_OUTPUT_DIR env var not set.", file=sys.stderr)
191+
sys.exit(1)
192+
193+
print("Running benchmark...")
194+
fake_data = np.array([101.2, 100.5, 102.1, 99.8, 101.5])
195+
196+
# Write the raw data to TensorBoard.
197+
try:
198+
writer = tf.summary.create_file_writer(tblog_dir)
199+
with writer.as_default():
200+
for i, value in enumerate(fake_data):
201+
# The tag "wall_time" MUST match the "name" in your MetricSpec.
202+
tf.summary.scalar("wall_time", value, step=i)
203+
204+
writer.flush()
205+
writer.close()
206+
print("Successfully wrote metrics.")
207+
208+
except Exception as e:
209+
print(f"Error writing TensorBoard logs: {e}", file=sys.stderr)
210+
sys.exit(1)
211+
```
212+
213+
## Step 4: Add repository to runner registry
214+
215+
Finally, the infrastructure needs to know what hardware is available to your repository. This is defined in a central "allowlist" file: [gha_runners.json](https://github.com/google-ml-infra/actions/blob/main/benchmarking/config/gha_runners.json).
216+
217+
If your workflow fails with an error like `Error: No runner pool defined for repository 'my-org/my-repo'`, please file a bug to have your repository and its available runners added to this file.

benchmarking/e2e_test/BUILD.bazel

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,14 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
filegroup(
16-
name = "e2e_test_files",
17-
srcs = glob([
18-
"*.py",
19-
"*.pbtxt",
20-
"pyproject.toml",
21-
"requirements.lock"
22-
]),
23-
visibility = ["//visibility:public"],
15+
load("@rules_python//python:py_binary.bzl", "py_binary")
16+
17+
py_binary(
18+
name = "test_benchmark",
19+
main = "run_benchmark.py",
20+
srcs = ["run_benchmark.py"],
21+
deps = [
22+
"@pypi//tensorflow",
23+
"@pypi//setuptools"
24+
],
2425
)

benchmarking/e2e_test/pyproject.toml

Lines changed: 0 additions & 12 deletions
This file was deleted.

0 commit comments

Comments
 (0)