run-perf-gen- runs the generator script to generate a 3 sets of 10k, 100k & 1M entriesrun-perf-build- uses the 3 sets to run build_report, with the default set of sheets, and with all available sheetsextract-timings- convertsrun-perf-buildoutput to a table of count, variant, memory, timegenerator.py- loads a tarball, duplicates and randomizes data, and saves to new tarball - wrapped byrun-perf-gen
# run once - takes about 5 minutes
tools/perf/run-perf-gen
# run for every tested branch - takes about an hour
tools/perf/run-perf-build | tee tmp
tools/perf/extract-timings tmp
For example, to compare recent PRs:
#!/bin/sh
set -e
tools/perf/run-perf-gen
git log --oneline --since='last week' devel | sed -e 's/ .*(#\([0-9]\+\))$/ \1/' | while read sha pr; do
echo; echo '----------------------- PR #'$pr; echo
git checkout "$sha"
tools/perf/run-perf-build | tee pr$pr
echo; echo
done
for pr in `ls --sort=time -r pr*`; do
tools/perf/extract-timings pr$pr
done
Wraps generator.py, to call it multiple times for 10k, 100k and 1M entries. Uses the same counts for each table.
The outputs are written in separate months, 2026-01 for 10k, 2026-02 for 100k and 2026-03 for 1M.
Use once at the start of perf test.
Eg. tools/perf/run-perf-gen
Outputs tarballs in tools/perf/generated/data/.
Runs metrics-utility build_report on the generated datasets, and measures memory consumption and time.
(Technically, cpu consumption is also measured, but, all of this is single-threaded python, so it is always 100%.)
It runs the CCSPv2 report, first with default options, and then again with all optional sheets and deduplication enabled.
Use after run-perf-gen, in each branch/config variant you want to test; tee stdout to a file for later processing.
Eg. tools/perf/run-perf-build | tee tmp.devel
Outputs reports in tools/perf/generated/reports/.
Runs on the output generated by run-perf-build, converts to a tsv that can be pasted in a drive sheet.
Eg. tools/perf/extract-timings tmp.devel
rows variant memory time
10000 default 161792K 26.09s
10000 all 162524K 29.58s
100000 default 514496K 214.53s
100000 all 544204K 228.76s
Loads a set of tarballs from SOURCE_DATA_PATH (glob),
removes data outside INPUT_DATE_FROM - INPUT_DATE_TO range (defaults to current year),
duplicates all data multiple times to reach a target *_UNIQUE_SIZE (per table),
randomizes hostnames,
duplicates all data again to reach target *_SIZE,
randomizes timestamps (within target interval between OUTPUT_DATE_FROM & OUTPUT_DATE_TO),
randomizes some product_serial & machine_id facts,
and outputs a new set of tarballs to OUTPUT_DATA_PATH (dir, eg. test_data/data/).
The counts can be adjusted per-table, tables can also be skipped by setting SELECTED_DATA (comma-separated).
(Not expected to run this manually, use the run-perf-gen wrapper.)