This is the (start of) a little command line tool to analyze benchmark results from our automated benchmark CI.
At the moment <2025-02-04 Tue 18:16> the tool receives the path to a CSV file with benchmark results for different VM versions. It proceeds to perform a fit of choice (currently either a linear function or a log-linear function) and produces plots with the fitted function. Further, the fit results are both printed to the terminal as well as written to a log file.
The basic usage looks like:
./benchmark_analyzer --raw -f <input_csv>but multiple optional arguments are supported:
NO_COLOR=true ./benchmark_analyzer -hUsage:
benchmark_analyzer [optional-params]
options:
-h, --help print this
cligen-erated help
--help-syntax advanced:
prepend,plurals,..
-f=, --fname= string "" Input CSV file
with benchmarking
results.
-p=, --plotPath= string "/tmp" Path wheer the
plot files are
written to.
-l=, --logPath= string "./logs" Path where the log
file is written
to. CURRENTLY
IGNORED.
--log10 bool false If true all plots
will be log10.
--fitFunc= FitFunction logLin Function used for
fitting.
-r, --raw bool false Indicates if the
input is raw data
or aggregate.
--logVerbosity= LogVerbosity lvDefault select 1
LogVerbosity
-t=, --traceSizes= string "resources/trace_sizes.csv" Path to the CSV
file containing
trace sizes for
each benchmark
program.
--testCol= string "Test Name" set testCol
-i=, --iterCol= string "Iteration" set iterCol
--rTimeCol= string "Real Time" set rTimeCol
-u=, --uTimeCol= string "User Time" set uTimeCol
-m=, --memCol= string "Memory (MB)" set memCol
-o=, --outlierThr= float 3.0 set outlierThr
--regressionThr= float 1.01 set regressionThr
--optimizationThr= float 0.99 set
optimizationThr
-c=, --compare= string "" set compare
--perfDiffThr= float 0.0 set perfDiffThr
The two supported models are linear \(f(x) = a·x + b\) and logLin
\(f(x) = a·x·ln(x) + b·x + c\).
By default (and currently always) the log file is written to ./logs/benchmark_analyzer.log. It contains output of the form:
[17:45:14] - INFO: Fitting data for: 8/9/24 -- Main trace size (field elements) against Max space (kB) [17:45:14] - INFO: Fit result for fit function: [17:45:14] - INFO: `result = p[0] * x + p[1]` [17:45:14] - INFO: ------------------------------ χ² = 864.779 (4 DOF) χ²/dof = 216.195 NPAR = 3 NFREE = 3 NPEGGED = 0 NITER = 3 NFEV = 14 P[0] = 0.0691786 +/- 0.00095791 P[1] = 2886.51 +/- 274.384 P[2] = 1 +/- 0e+00 [17:45:14] - INFO: ------------------------------ [17:45:14] - INFO: Fitting data for: 8/9/24 -- Permutation trace size (extension field elements) against Max space (kB) [17:45:14] - INFO: Fit result for fit function: [17:45:14] - INFO: `result = p[0] * x + p[1]` [17:45:14] - INFO: ------------------------------ χ² = 906.715 (4 DOF) χ²/dof = 226.679 NPAR = 3 NFREE = 3 NPEGGED = 0 NITER = 3 NFEV = 14 P[0] = 0.409423 +/- 0.00569218 P[1] = -17124.8 +/- 459.657 P[2] = 1 +/- 0e+00 [17:45:14] - INFO: ------------------------------
Many plots are generated and stored in plotPath. We generate:
- individual plots
- of the raw data of each trace against each metric (user time, real time, space)
- of the data and its fit against each metric with the fit parameters embedded into the figure
- combined plots
- for each metric a grid of each trace with the fit parameters. For the moment these do not include fit results, as it would be a bit too crowded (we can decide to add specific information that we deem important).
An example for an individual fit result (linear function):
An example for a combined grid plot of all traces for a single metric (log-linear function):
The tool supports comparing two input files for performance improvements or regressions.
This is done by using the additional --compare argument and passing
in a second CSV file:
./benchmark_analyzer --raw -f <input_csv> --compare <comparison_csv>A log file is written to ./logs/benchmark_analyzer.log as well as to
stdout.
For the moment building the tool requires one to:
- install a recent version of Nim (as of writing <2025-02-04 Tue 18:43> 2.2 is the latest release). You can follow the installation instructions on the Nim website.
- install the
ggplotnimdependencies (libcairo) following your operating system’s instructions to install (if there was a Windows user for this tool, follow this). - use
nimbleto get all dependencies:nimble setup - build the C shared library of mpfit, follow the instructions here: https://github.com/Vindaar/nim-mpfit?tab=readme-ov-file#dependencies–installation
- build the binary of the tool (produces the binary
benchmark_analyzerin the directory of this repo):nimble installor manually via:
nim c -d:release benchmark_analyzer - (optional) add the directory of the repo to your
PATH, move the binary or create a symlink of your choice.
NOTE: I might improve the install situation in the future by either
automating the mpfit build process or replace mpfit by a native
Nim implementation (nowadays we have a Levenberg-Marquardt
implementation in numericalnim).
- currently the uncertainties for the metric (time or space) are hardcoded to 3% of the value. This does not really have any foundation! We need statistics to estimate realistic numbers!
- starting parameters are just taken to
1in all parameters. The underlying Levenberg-Marquardt non-linear least squares library used under the hood, mpfit, generally does a good job of finding good starting parameters.
At the moment the input data needs to be a CSV file of the raw benchmark output. An example file is:
./resources/valida-03-06-2025-07-36-b1f407b4ed2662a4932cf8d253893e9987b3c222_raw.csv
At the moment the information of the trace sizes is given in the following CSV file
which is generated by
We can imagine to add a lot of interesting features in the future:
- more detailed reporting of fit results (covariance matrix, …)
- automatic report generation beyond a log file
- [X] highlighting of outliers
- generation of structured output data for further processing by another tool (e.g. for immediate reporting of performance regressions)
- [X] statistical analyses of aggregates of multiple benchmark runs once we have statistics
- bootstrap resampling of existing data
- and probably lots more…