|
| 1 | +# vk-bench (Level 0 Vulkan micro-benchmark) |
| 2 | + |
| 3 | +**Purpose** |
| 4 | + |
| 5 | +A containerized Vulkan micro-benchmark that renders one controlled workload at a time and produces repeatable performance numbers plus Nsight captures. |
| 6 | + |
| 7 | +## What this benchmarks |
| 8 | + |
| 9 | +This Level 0 repo intentionally scopes to **one executable** (`vk-bench`) and **three micro-scenes**: |
| 10 | + |
| 11 | +- `triangle` (sanity / low work) |
| 12 | +- `million-tris` (high draw-like transfer load) |
| 13 | +- `compute-copy` (bandwidth-focused transfer load) |
| 14 | + |
| 15 | +No assets, textures, or engine features are included. |
| 16 | + |
| 17 | +## How to run |
| 18 | + |
| 19 | +```bash |
| 20 | +docker build -t vk-bench . |
| 21 | +docker run --rm --gpus all -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix vk-bench |
| 22 | +docker run --rm --gpus all vk-bench --headless --frames 300 --out results.json |
| 23 | +``` |
| 24 | + |
| 25 | +### Bench all 3 scenes |
| 26 | + |
| 27 | +```bash |
| 28 | +scripts/run_bench.sh results |
| 29 | +``` |
| 30 | + |
| 31 | +## Example results |
| 32 | + |
| 33 | +```json |
| 34 | +{ |
| 35 | + "scene": "million-tris", |
| 36 | + "cpu_frame_time_ms": {"avg": 0.0731, "p50": 0.0613, "p95": 0.1188}, |
| 37 | + "gpu_frame_time_ms": {"avg": 2.4182, "p50": 2.3395, "p95": 2.7560} |
| 38 | +} |
| 39 | +``` |
| 40 | + |
| 41 | + |
| 42 | + |
| 43 | +## How timing is measured (CPU/GPU) |
| 44 | + |
| 45 | +- **GPU frame time**: Vulkan timestamp queries (`vkCmdWriteTimestamp`) around the workload command region. |
| 46 | +- **CPU frame time (submission)**: host timer around `vkQueueSubmit` call. |
| 47 | +- Per-frame values are recorded and summarized as `avg`, `p50`, and `p95` in JSON. |
| 48 | + |
| 49 | +## Nsight steps (exact command) |
| 50 | + |
| 51 | +```bash |
| 52 | +scripts/nsight_capture.sh results/nsight_capture |
| 53 | +``` |
| 54 | + |
| 55 | +Or directly: |
| 56 | + |
| 57 | +```bash |
| 58 | +nsys profile --trace=vulkan,nvtx,cuda --output results/nsight_capture \ |
| 59 | + vk-bench --headless --scene million-tris --warmup 20 --frames 120 --out results/nsight_capture.json |
| 60 | +``` |
| 61 | + |
| 62 | + |
| 63 | + |
| 64 | +## Container GPU access options |
| 65 | + |
| 66 | +### Option A: Linux host + NVIDIA (recommended) |
| 67 | + |
| 68 | +- Use `--gpus all`. |
| 69 | +- Verify loader + ICD in-container: |
| 70 | + |
| 71 | +```bash |
| 72 | +docker run --rm --gpus all vk-bench vulkaninfo --summary |
| 73 | +``` |
| 74 | + |
| 75 | +### Option B: Headless-only explicit ICD config |
| 76 | + |
| 77 | +If automatic ICD discovery is unavailable, set: |
| 78 | + |
| 79 | +```bash |
| 80 | +export VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json |
| 81 | +vk-bench --headless --frames 300 --out results.json |
| 82 | +``` |
| 83 | + |
| 84 | +## Known limitations |
| 85 | + |
| 86 | +- Current Level 0 workload uses transfer-copy command streams to provide stable timing; it does not yet create a swapchain/windowed render path. |
| 87 | +- CI can validate build/formatting but not real GPU benchmark values unless run on a self-hosted GPU runner. |
0 commit comments