Two workflows:
- Headless timing (recommended) — post-warmup render throughput for N frames at 1 or 5 samples per pixel; compare logs between builds.
- Scripted sequencer — multi-step
.cfgmatrix with per-stage GPU profiler stats (optional, heavier).
Render 500 frames (or any count) with path tracing. Set --frames and --maxFrames to the same value so every app frame accumulates samples (default maxFrames is already 500).
./vk_gltf_renderer --headless --size 1920 1080 \
--scenefile shader_ball.gltf \
--hdrfile std_env.hdr \
--frames 500 --maxFrames 500 \
--ptSamples 1 --ptAdaptiveSampling 0 \
--renderSystem 0 --envSystem 1While rendering, periodic progress lines confirm the run is not stuck (every 50 frames or 5 seconds):
HEADLESS_START frames=500 maxFrames=500 ptSamples=1
HEADLESS_PROGRESS app_frame 50/500 (10%) elapsed_ms=1234.5 ms_per_frame=24.69
...
HEADLESS_SUMMARY frames=500 maxFrames=500 ptSamples=1 effective_spp=500 measured_effective_spp=499 resolution=1920x1080 wall_ms=12320.987 ms_per_frame=24.691 total_wall_ms=12345.678 total_ms_per_frame=24.691 warmup_frames=1 measured_frames=499 throughput_MSps=84.0 spp_per_sec=40.49
BENCHMARK_JSON {"schema":1,"type":"headless_summary",...}
- app_frame — headless loop index (
--frames). - In headless mode,
main()raises--maxFramesto at least--framesif you set it lower, so every app frame can accumulate samples during timing runs. - wall_ms — measured post-warmup render time. The first completed frame is excluded so one-time setup such as shader specialization is not charged to throughput.
- total_wall_ms — full headless render-loop wall time, including warmup and any synchronous setup.
- ms_per_frame —
wall_ms / measured_frames. - effective_spp — total final-image accumulation:
min(frames, maxFrames) × ptSamples. - measured_effective_spp — accumulation covered by the measured post-warmup window.
- throughput_MSps — measured mega pixel-samples per second (
resolution × measured_effective_spp / wall_s / 10⁶; higher is faster). - spp_per_sec — measured
measured_effective_spp / wall_sat this resolution (how fast quality accumulates; higher is faster).
Repeat with --ptSamples 5 for 5 spp per frame (effective_spp=2500).
python utils/benchmark/benchmark.py headless --scene resources/shader_ball.gltf --frames 500 --spp 1 5Logs: utils/benchmark/output/headless_<scene>_spp<N>.log
CSV: utils/benchmark/output/headless_results.csv
Compare two builds (same scene/spp, different executable):
python utils/benchmark/benchmark.py headless-compare \
utils/benchmark/output/headless_shader_ball_spp1_baseline.log \
utils/benchmark/output/headless_shader_ball_spp1_candidate.log(Use one log from build A and one from build B.)
Scripted benchmarks measure GPU frame time and VRAM usage across scenes, cameras, and renderer settings. Output is designed for regression testing when comparing builds or branches.
Build the sample, then from the project directory:
# Fast smoke benchmark (one scene)
python utils/benchmark/benchmark.py run quick.cfg --scene resources/shader_ball.gltf --hdr std_env.hdr
# Full matrix (edit utils/benchmark/scenes.example.txt for your asset paths)
python utils/benchmark/benchmark.py run matrix.cfg \
--scenes-file utils/benchmark/scenes.example.txt \
--scenes-root . \
--csv-name benchmark_results.csvResults land in utils/benchmark/output/ (logs per scene + CSV). See also utils/benchmark/README.md.
--benchmark 1turns off vsync, hides side panels (keeps a fullscreen viewport with the tonemapped image), and drains the scene load pipeline synchronously each frame.ElementSequencersteps through a.cfgscript (SEQUENCE "name"blocks).- After each sequence,
ProfilerManagerlogsParameterSequenceblocks at log leveleSTATS(GPU/CPU timer averages). benchmarkAdvance()records Scene and PathTracer/Rasterizer VRAM stats.- When the script finishes, the app closes automatically.
Example (utils/benchmark/quick.cfg):
SEQUENCE "Path tracer - 1 spp"
--sequenceframes 512
--sequenceaverages 128
--sequenceresetframes 16
--renderSystem 0
--ptSamples 1
--maxFrames 1
--ptAdaptiveSampling 0
--gltfCamera 0
--updateData
| Token | Meaning |
|---|---|
SEQUENCE "..." |
Starts a new measured step |
--sequenceframes |
Frames to run this step |
--sequenceaverages |
Frames averaged for profiler report |
--sequenceresetframes |
Warmup frames after parameter changes (0 = measure immediately) |
Other --flags |
Any registered CLI parameter (renderer, path tracer, tonemapper, etc.) |
Path tracer note: Set --maxFrames to match --ptSamples when measuring convergence cost. Use --maxFrames 1 with --ptSamples 1 for per-frame interactive GPU time.
# Baseline build
python utils/benchmark/benchmark.py run matrix.cfg --scene my_scene.gltf --csv-name baseline.csv
# Candidate build (rebuild executable first)
python utils/benchmark/benchmark.py run matrix.cfg --scene my_scene.gltf --csv-name candidate.csv
python utils/benchmark/benchmark.py compare baseline.csv candidate.csv --output diff.csv --regression-threshold-pct 5compare marks Regression when candidate GPU time is more than N% slower than baseline. Negative delta % means faster.
utils/benchmark/benchmark.py reads stable BENCHMARK_JSON records first, with legacy text parsing as a fallback:
BENCHMARK_JSON {"schema":1,"type":"headless_summary",...}BENCHMARK_JSON {"schema":1,"type":"sequence_memory",...}ParameterSequence N "name" = { Timer "..."; GPU; avg ...; CPU; avg ...; }BENCHMARK_ADV N { Memory Scene; ... Memory PathTracer; ... }
Auto-generated log: log_<executable>.txt next to the binary (Logger behavior).
- Use fixed resolution (
--size 1920 1080inutils/benchmark/benchmark.py) for comparable numbers. - Disable validation layers for performance runs (
--vvloff by default in Release). - Add scenes to
utils/benchmark/scenes.example.txt(name + relative path per line). - Multi-camera scenes: add sequences with
--gltfCamera 0,--gltfCamera 1, etc. - After large setting changes, use
--updateDataor--resetFrame(no value; bool triggers) and non-zero--sequenceresetframes.