Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

measuring metrics-utility build_report performance

  • run-perf-gen - runs the generator script to generate a 3 sets of 10k, 100k & 1M entries
  • run-perf-build - uses the 3 sets to run build_report, with the default set of sheets, and with all available sheets
  • extract-timings - converts run-perf-build output to a table of count, variant, memory, time
  • generator.py - loads a tarball, duplicates and randomizes data, and saves to new tarball - wrapped by run-perf-gen

TLDR

# run once - takes about 5 minutes
tools/perf/run-perf-gen

# run for every tested branch - takes about an hour
tools/perf/run-perf-build | tee tmp
tools/perf/extract-timings tmp

For example, to compare recent PRs:

#!/bin/sh
set -e

tools/perf/run-perf-gen

git log --oneline --since='last week' devel | sed -e 's/ .*(#\([0-9]\+\))$/ \1/' | while read sha pr; do
  echo; echo '----------------------- PR #'$pr; echo
  git checkout "$sha"
  tools/perf/run-perf-build | tee pr$pr
  echo; echo
done

for pr in `ls --sort=time -r pr*`; do
  tools/perf/extract-timings pr$pr
done

run-perf-gen

Wraps generator.py, to call it multiple times for 10k, 100k and 1M entries. Uses the same counts for each table. The outputs are written in separate months, 2026-01 for 10k, 2026-02 for 100k and 2026-03 for 1M.

Use once at the start of perf test.

Eg. tools/perf/run-perf-gen

Outputs tarballs in tools/perf/generated/data/.

run-perf-build

Runs metrics-utility build_report on the generated datasets, and measures memory consumption and time. (Technically, cpu consumption is also measured, but, all of this is single-threaded python, so it is always 100%.)

It runs the CCSPv2 report, first with default options, and then again with all optional sheets and deduplication enabled.

Use after run-perf-gen, in each branch/config variant you want to test; tee stdout to a file for later processing. Eg. tools/perf/run-perf-build | tee tmp.devel

Outputs reports in tools/perf/generated/reports/.

extract-timings

Runs on the output generated by run-perf-build, converts to a tsv that can be pasted in a drive sheet.

Eg. tools/perf/extract-timings tmp.devel

rows	variant	memory	time
10000	default	161792K	26.09s
10000	all	162524K	29.58s
100000	default	514496K	214.53s
100000	all	544204K	228.76s

generator.py

Loads a set of tarballs from SOURCE_DATA_PATH (glob), removes data outside INPUT_DATE_FROM - INPUT_DATE_TO range (defaults to current year), duplicates all data multiple times to reach a target *_UNIQUE_SIZE (per table), randomizes hostnames, duplicates all data again to reach target *_SIZE, randomizes timestamps (within target interval between OUTPUT_DATE_FROM & OUTPUT_DATE_TO), randomizes some product_serial & machine_id facts, and outputs a new set of tarballs to OUTPUT_DATA_PATH (dir, eg. test_data/data/).

The counts can be adjusted per-table, tables can also be skipped by setting SELECTED_DATA (comma-separated).

(Not expected to run this manually, use the run-perf-gen wrapper.)