|
| 1 | +# Onboarding Guide: ML Benchmarking |
| 2 | + |
| 3 | +This guide provides the steps to add a project's benchmarks to the ML Benchmarking Infrastructure. ML Benchmarking is GitHub-native and makes use of GitHub Actions to administer benchmarks. |
| 4 | + |
| 5 | +The system is designed to be agnostic to the benchmark workload, supporting any workload, such as Bazel targets or Python-based scripts. |
| 6 | + |
| 7 | +The system follows two simple contracts: |
| 8 | + |
| 9 | +1. Input: A benchmark_registry.pbtxt (a "manifest"), defining benchmark requirements. |
| 10 | + |
| 11 | +2. Output: Benchmark scripts write raw metric data via TensorBoard. |
| 12 | + |
| 13 | +Our infrastructure handles the following: |
| 14 | + |
| 15 | +- Provisioning the correct GitHub Actions runners. |
| 16 | +- Converting defined benchmarks and hardware requirements into GitHub Actions jobs. |
| 17 | +- Workload build and dependency installation. |
| 18 | +- TensorBoard log parsing and statistic computation. |
| 19 | +- Static threshold analysis. |
| 20 | + |
| 21 | +## Step 1: Create a workflow file |
| 22 | + |
| 23 | +First, in your own repository, create a new workflow file in `.github/workflows/` for running benchmarks. |
| 24 | + |
| 25 | +```yaml |
| 26 | +name: Run presubmit benchmarks |
| 27 | + |
| 28 | +on: |
| 29 | + pull_request: |
| 30 | + paths: |
| 31 | + - 'benchmarking/**' |
| 32 | + |
| 33 | +permissions: |
| 34 | + contents: read |
| 35 | + |
| 36 | +jobs: |
| 37 | + run_benchmarks: |
| 38 | + uses: google-ml-infra/actions/.github/workflows/run_benchmarks.yml@<commit | branch | tag> |
| 39 | + with: |
| 40 | + registry_file: "benchmarking/my_registry.pbtxt" |
| 41 | + workflow_type: "PRESUBMIT" |
| 42 | + ml_actions_ref: <commit | branch | tag> |
| 43 | +``` |
| 44 | +
|
| 45 | +### Required permissions |
| 46 | +
|
| 47 | +`permissions: contents: read` permission is required in the caller workflow. The reusable workflow's access token inherits permissions from the caller, so the caller must explicitly grant the read rights needed for actions/checkout to succeed. |
| 48 | + |
| 49 | +### ml_actions_ref |
| 50 | + |
| 51 | +You must specify the ml_actions_ref input, otherwise it will default to `main`. |
| 52 | + |
| 53 | +This value tells the reusable workflow which version (branch, tag, or SHA) of the google-ml-infra/actions repository to check out its internal scripts (e.g., install_pip_deps.sh). |
| 54 | + |
| 55 | +For production, use the same stable tag or SHA as used in the main `uses` line to pin the workflow file version. |
| 56 | + |
| 57 | +### Workflow granularity |
| 58 | +We recommend creating a dedicated workflow file for each distinct [workflow_type](https://github.com/google-ml-infra/actions/blob/main/benchmarking/proto/benchmark_registry.proto#L112) you plan to support (PRESUBMIT, NIGHTLY, PERIODIC, etc.) to better control scheduling, triggers, and resource allocation. |
| 59 | + |
| 60 | + |
| 61 | +## Step 2: Create benchmark registry |
| 62 | + |
| 63 | +Next, create a benchmark registry file (.pbtxt). It defines what benchmarks to run and how to run them, based on the [benchmark_registry.proto](https://github.com/google-ml-infra/actions/blob/main/benchmarking/proto/benchmark_registry.proto) schema. |
| 64 | + |
| 65 | +A key part of the registry is defining metrics. You must specify the `metrics.name` field, which must exactly match the tag name used in the TensorBoard logs generated by your benchmark script (covered in the next step). Within the metrics block, you specify the statistics (stats) to be calculated (e.g., MEAN, P99) and can optionally configure static threshold analysis using the comparison block. |
| 66 | + |
| 67 | +### Example 1: Bazel workload |
| 68 | + |
| 69 | +```proto |
| 70 | +benchmarks { |
| 71 | + name: "my_bazel_benchmark" |
| 72 | + description: "Runs a simple Bazel target." |
| 73 | + owner: "my-team" |
| 74 | +
|
| 75 | + workload { |
| 76 | + bazel_workload { |
| 77 | + execution_target: "//my_project:my_benchmark_binary" |
| 78 | + } |
| 79 | + runtime_flags: "--model_name=resnet" |
| 80 | + } |
| 81 | +
|
| 82 | + hardware_configs { |
| 83 | + hardware_category: CPU_X86 |
| 84 | + topology { num_hosts: 1, num_devices_per_host: 1 } |
| 85 | + workflow_type: [PRESUBMIT] |
| 86 | +
|
| 87 | + # Add hardware-specific runtime flags |
| 88 | + runtime_flags: "--precision=fp32" |
| 89 | + } |
| 90 | +
|
| 91 | + update_frequency_policy: QUARTERLY |
| 92 | +
|
| 93 | + metrics { |
| 94 | + # REQUIRED: Must match the TensorBoard tag name (e.g., 'wall_time' in the log) |
| 95 | + name: "wall_time" |
| 96 | + unit: "ms" |
| 97 | +
|
| 98 | + stats { |
| 99 | + stat: MEAN |
| 100 | + comparison: { |
| 101 | + # Configures static threshold analysis against a baseline |
| 102 | + baseline { value: 100.0 } |
| 103 | + threshold { value: 0.1 } |
| 104 | + improvement_direction: LESS |
| 105 | + } |
| 106 | + } |
| 107 | +
|
| 108 | + stats { |
| 109 | + stat: P99 |
| 110 | + } |
| 111 | + } |
| 112 | +} |
| 113 | +``` |
| 114 | + |
| 115 | +### Example 2: Python workload |
| 116 | + |
| 117 | +```proto |
| 118 | +benchmarks { |
| 119 | + name: "my_python_benchmark" |
| 120 | + description: "Runs a pip-based Python script." |
| 121 | + owner: "my-team" |
| 122 | +
|
| 123 | + workload { |
| 124 | + python_workload { |
| 125 | + script_path: "benchmarking/scripts/run_pallas.py" |
| 126 | + python_version: "3.11" |
| 127 | +
|
| 128 | + # The directory containing your pyproject.toml |
| 129 | + pip_project_path: "." |
| 130 | + |
| 131 | + # Optional dependencies from your pyproject.toml [project.optional-dependencies] |
| 132 | + pip_optional_dependencies: "test" |
| 133 | + } |
| 134 | + runtime_flags: "--model_name=my_kernel" |
| 135 | + } |
| 136 | + |
| 137 | + hardware_configs { |
| 138 | + hardware_category: GPU_L4 |
| 139 | + topology { num_hosts: 1, num_devices_per_host: 1 } |
| 140 | + workflow_type: [PRESUBMIT] |
| 141 | + |
| 142 | + # Add hardware-specific optional dependencies |
| 143 | + pip_optional_dependencies: "cuda" |
| 144 | + |
| 145 | + # Add hardware-specific runtime flags |
| 146 | + runtime_flags: "--use_gpu" |
| 147 | + } |
| 148 | +
|
| 149 | + update_frequency_policy: QUARTERLY |
| 150 | +
|
| 151 | + metrics { |
| 152 | + # REQUIRED: Must match the TensorBoard tag name |
| 153 | + name: "throughput" |
| 154 | + unit: "samples/sec" |
| 155 | + |
| 156 | + stats { |
| 157 | + stat: MEAN |
| 158 | + comparison: { |
| 159 | + baseline { value: 5000.0 } |
| 160 | + threshold { value: 0.05 } |
| 161 | + improvement_direction: GREATER |
| 162 | + } |
| 163 | + } |
| 164 | +
|
| 165 | + stats { |
| 166 | + stat: MIN |
| 167 | + } |
| 168 | + } |
| 169 | +} |
| 170 | +``` |
| 171 | + |
| 172 | +## Step 3: Log metrics via TensorBoard |
| 173 | + |
| 174 | +Your benchmark script will need to log metrics via TensorBoard to integrate with the infrastructure. |
| 175 | + |
| 176 | +The reusable workflow provides a standard environment variable, `TENSORBOARD_OUTPUT_DIR`, which points to the directory where TensorBoard must write event files. |
| 177 | + |
| 178 | +Example script: |
| 179 | + |
| 180 | +```python |
| 181 | +import tensorflow as tf |
| 182 | +import os |
| 183 | +import sys |
| 184 | +import numpy as np |
| 185 | +
|
| 186 | +# Get the output directory from the infrastructure. |
| 187 | +tblog_dir = os.environ.get("TENSORBOARD_OUTPUT_DIR") |
| 188 | +
|
| 189 | +if not tblog_dir: |
| 190 | + print("Error: TENSORBOARD_OUTPUT_DIR env var not set.", file=sys.stderr) |
| 191 | + sys.exit(1) |
| 192 | +
|
| 193 | +print("Running benchmark...") |
| 194 | +fake_data = np.array([101.2, 100.5, 102.1, 99.8, 101.5]) |
| 195 | +
|
| 196 | +# Write the raw data to TensorBoard. |
| 197 | +try: |
| 198 | + writer = tf.summary.create_file_writer(tblog_dir) |
| 199 | + with writer.as_default(): |
| 200 | + for i, value in enumerate(fake_data): |
| 201 | + # The tag "wall_time" MUST match the "name" in your MetricSpec. |
| 202 | + tf.summary.scalar("wall_time", value, step=i) |
| 203 | +
|
| 204 | + writer.flush() |
| 205 | + writer.close() |
| 206 | + print("Successfully wrote metrics.") |
| 207 | +
|
| 208 | +except Exception as e: |
| 209 | + print(f"Error writing TensorBoard logs: {e}", file=sys.stderr) |
| 210 | + sys.exit(1) |
| 211 | +``` |
| 212 | + |
| 213 | +## Step 4: Add repository to runner registry |
| 214 | + |
| 215 | +Finally, the infrastructure needs to know what hardware is available to your repository. This is defined in a central "allowlist" file: [gha_runners.json](https://github.com/google-ml-infra/actions/blob/main/benchmarking/config/gha_runners.json). |
| 216 | + |
| 217 | +If your workflow fails with an error like `Error: No runner pool defined for repository 'my-org/my-repo'`, please file a bug to have your repository and its available runners added to this file. |
0 commit comments