Benchmark analyzer

This is the (start of) a little command line tool to analyze benchmark results from our automated benchmark CI.

At the moment <2025-02-04 Tue 18:16> the tool receives the path to a CSV file with benchmark results for different VM versions. It proceeds to perform a fit of choice (currently either a linear function or a log-linear function) and produces plots with the fitted function. Further, the fit results are both printed to the terminal as well as written to a log file.

Usage

The basic usage looks like:

./benchmark_analyzer --raw -f <input_csv>

but multiple optional arguments are supported:

NO_COLOR=true ./benchmark_analyzer -h

Usage:
  benchmark_analyzer [optional-params] 
options:
  -h, --help                                                  print this
                                                              cligen-erated help
  
  --help-syntax                                               advanced:
                                                              prepend,plurals,..
  
  -f=, --fname=      string       ""                          Input CSV file
                                                              with benchmarking
                                                              results.
  
  -p=, --plotPath=   string       "/tmp"                      Path wheer the
                                                              plot files are
                                                              written to.
  
  -l=, --logPath=    string       "./logs"                    Path where the log
                                                              file is written
                                                              to. CURRENTLY
                                                              IGNORED.
  
  --log10            bool         false                       If true all plots
                                                              will be log10.
  
  --fitFunc=         FitFunction  logLin                      Function used for
                                                              fitting.
  
  -r, --raw          bool         false                       Indicates if the
                                                              input is raw data
                                                              or aggregate.
  
  --logVerbosity=    LogVerbosity lvDefault                   select 1
                                                              LogVerbosity
  
  -t=, --traceSizes= string       "resources/trace_sizes.csv" Path to the CSV
                                                              file containing
                                                              trace sizes for
                                                              each benchmark
                                                              program.
  
  --testCol=         string       "Test Name"                 set testCol
  
  -i=, --iterCol=    string       "Iteration"                 set iterCol
  
  --rTimeCol=        string       "Real Time"                 set rTimeCol
  
  -u=, --uTimeCol=   string       "User Time"                 set uTimeCol
  
  -m=, --memCol=     string       "Memory (MB)"               set memCol
  
  -o=, --outlierThr= float        3.0                         set outlierThr
  
  --regressionThr=   float        1.01                        set regressionThr
  
  --optimizationThr= float        0.99                        set
                                                              optimizationThr
  
  -c=, --compare=    string       ""                          set compare
  
  --perfDiffThr=     float        0.0                         set perfDiffThr

The two supported models are linear \(f(x) = a·x + b\) and logLin \(f(x) = a·x·ln(x) + b·x + c\).

By default (and currently always) the log file is written to ./logs/benchmark_analyzer.log. It contains output of the form:

[17:45:14] - INFO: Fitting data for: 8/9/24 -- Main trace size (field elements) against Max space (kB)
[17:45:14] - INFO: Fit result for fit function:
[17:45:14] - INFO: 	`result = p[0] * x + p[1]`
[17:45:14] - INFO: ------------------------------
  χ²      = 864.779    (4 DOF)
  χ²/dof  = 216.195
  NPAR    = 3
  NFREE   = 3
  NPEGGED = 0
  NITER   = 3
  NFEV    = 14
  P[0] = 0.0691786 +/- 0.00095791
  P[1] = 2886.51   +/- 274.384
  P[2] = 1         +/- 0e+00
[17:45:14] - INFO: ------------------------------
[17:45:14] - INFO: Fitting data for: 8/9/24 -- Permutation trace size (extension field elements) against Max space (kB)
[17:45:14] - INFO: Fit result for fit function:
[17:45:14] - INFO: 	`result = p[0] * x + p[1]`
[17:45:14] - INFO: ------------------------------
  χ²      = 906.715    (4 DOF)
  χ²/dof  = 226.679
  NPAR    = 3
  NFREE   = 3
  NPEGGED = 0
  NITER   = 3
  NFEV    = 14
  P[0] = 0.409423 +/- 0.00569218
  P[1] = -17124.8 +/- 459.657
  P[2] = 1        +/- 0e+00
[17:45:14] - INFO: ------------------------------

Many plots are generated and stored in plotPath. We generate:

individual plots
- of the raw data of each trace against each metric (user time, real time, space)
- of the data and its fit against each metric with the fit parameters embedded into the figure
combined plots
- for each metric a grid of each trace with the fit parameters. For the moment these do not include fit results, as it would be a bit too crowded (we can decide to add specific information that we deem important).

An example for an individual fit result (linear function):

An example for a combined grid plot of all traces for a single metric (log-linear function):

Comparison of two benchmark results

The tool supports comparing two input files for performance improvements or regressions.

This is done by using the additional --compare argument and passing in a second CSV file:

./benchmark_analyzer --raw -f <input_csv> --compare <comparison_csv>

A log file is written to ./logs/benchmark_analyzer.log as well as to stdout.

Build instructions

For the moment building the tool requires one to:

install a recent version of Nim (as of writing <2025-02-04 Tue 18:43> 2.2 is the latest release). You can follow the installation instructions on the Nim website.
install the ggplotnim dependencies (libcairo) following your operating system’s instructions to install (if there was a Windows user for this tool, follow this).
use nimble to get all dependencies:
```
nimble setup
    
```
build the C shared library of mpfit, follow the instructions here: https://github.com/Vindaar/nim-mpfit?tab=readme-ov-file#dependencies–installation
build the binary of the tool (produces the binary benchmark_analyzer in the directory of this repo):
```
nimble install
    
```
or manually via:
```
nim c -d:release benchmark_analyzer
    
```
(optional) add the directory of the repo to your PATH, move the binary or create a symlink of your choice.

NOTE: I might improve the install situation in the future by either automating the mpfit build process or replace mpfit by a native Nim implementation (nowadays we have a Levenberg-Marquardt implementation in numericalnim).

Important notes

currently the uncertainties for the metric (time or space) are hardcoded to 3% of the value. This does not really have any foundation! We need statistics to estimate realistic numbers!
starting parameters are just taken to 1 in all parameters. The underlying Levenberg-Marquardt non-linear least squares library used under the hood, mpfit, generally does a good job of finding good starting parameters.

Input data

At the moment the input data needs to be a CSV file of the raw benchmark output. An example file is:

./resources/valida-03-06-2025-07-36-b1f407b4ed2662a4932cf8d253893e9987b3c222_raw.csv

At the moment the information of the trace sizes is given in the following CSV file

./resources/trace_sizes.csv,

which is generated by

https://github.com/lita-xyz/valida-toolchain/blob/fb6368a2d40564a92259df4a1cda43819475b292/llvm-valida/benchmarking/get_trace_sizes.py

Possible future features

We can imagine to add a lot of interesting features in the future:

more detailed reporting of fit results (covariance matrix, …)
automatic report generation beyond a log file
[X] highlighting of outliers
generation of structured output data for further processing by another tool (e.g. for immediate reporting of performance regressions)
[X] statistical analyses of aggregates of multiple benchmark runs once we have statistics
bootstrap resampling of existing data
and probably lots more…

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
media		media
resources		resources
README.org		README.org
benchmark_analyzer.nim		benchmark_analyzer.nim
benchmark_analyzer.nimble		benchmark_analyzer.nimble
changelog.org		changelog.org
mergeCfgEnvLocal.nim		mergeCfgEnvLocal.nim
nimble.lock		nimble.lock
utils.nim		utils.nim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Benchmark analyzer

Usage

Comparison of two benchmark results

Build instructions

Important notes

Input data

Possible future features

About

Uh oh!

Releases

Packages

Languages

lita-xyz/benchmark-analyzer

Folders and files

Latest commit

History

Repository files navigation

Benchmark analyzer

Usage

Comparison of two benchmark results

Build instructions

Important notes

Input data

Possible future features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages