Skip to content

lita-xyz/benchmark-analyzer

Repository files navigation

Benchmark analyzer

This is the (start of) a little command line tool to analyze benchmark results from our automated benchmark CI.

At the moment <2025-02-04 Tue 18:16> the tool receives the path to a CSV file with benchmark results for different VM versions. It proceeds to perform a fit of choice (currently either a linear function or a log-linear function) and produces plots with the fitted function. Further, the fit results are both printed to the terminal as well as written to a log file.

Usage

The basic usage looks like:

./benchmark_analyzer --raw -f <input_csv>

but multiple optional arguments are supported:

NO_COLOR=true ./benchmark_analyzer -h
Usage:
  benchmark_analyzer [optional-params] 
options:
  -h, --help                                                  print this
                                                              cligen-erated help
  
  --help-syntax                                               advanced:
                                                              prepend,plurals,..
  
  -f=, --fname=      string       ""                          Input CSV file
                                                              with benchmarking
                                                              results.
  
  -p=, --plotPath=   string       "/tmp"                      Path wheer the
                                                              plot files are
                                                              written to.
  
  -l=, --logPath=    string       "./logs"                    Path where the log
                                                              file is written
                                                              to. CURRENTLY
                                                              IGNORED.
  
  --log10            bool         false                       If true all plots
                                                              will be log10.
  
  --fitFunc=         FitFunction  logLin                      Function used for
                                                              fitting.
  
  -r, --raw          bool         false                       Indicates if the
                                                              input is raw data
                                                              or aggregate.
  
  --logVerbosity=    LogVerbosity lvDefault                   select 1
                                                              LogVerbosity
  
  -t=, --traceSizes= string       "resources/trace_sizes.csv" Path to the CSV
                                                              file containing
                                                              trace sizes for
                                                              each benchmark
                                                              program.
  
  --testCol=         string       "Test Name"                 set testCol
  
  -i=, --iterCol=    string       "Iteration"                 set iterCol
  
  --rTimeCol=        string       "Real Time"                 set rTimeCol
  
  -u=, --uTimeCol=   string       "User Time"                 set uTimeCol
  
  -m=, --memCol=     string       "Memory (MB)"               set memCol
  
  -o=, --outlierThr= float        3.0                         set outlierThr
  
  --regressionThr=   float        1.01                        set regressionThr
  
  --optimizationThr= float        0.99                        set
                                                              optimizationThr
  
  -c=, --compare=    string       ""                          set compare
  
  --perfDiffThr=     float        0.0                         set perfDiffThr

The two supported models are linear \(f(x) = a·x + b\) and logLin \(f(x) = a·x·ln(x) + b·x + c\).

By default (and currently always) the log file is written to ./logs/benchmark_analyzer.log. It contains output of the form:

[17:45:14] - INFO: Fitting data for: 8/9/24 -- Main trace size (field elements) against Max space (kB)
[17:45:14] - INFO: Fit result for fit function:
[17:45:14] - INFO: 	`result = p[0] * x + p[1]`
[17:45:14] - INFO: ------------------------------
  χ²      = 864.779    (4 DOF)
  χ²/dof  = 216.195
  NPAR    = 3
  NFREE   = 3
  NPEGGED = 0
  NITER   = 3
  NFEV    = 14
  P[0] = 0.0691786 +/- 0.00095791
  P[1] = 2886.51   +/- 274.384
  P[2] = 1         +/- 0e+00
[17:45:14] - INFO: ------------------------------
[17:45:14] - INFO: Fitting data for: 8/9/24 -- Permutation trace size (extension field elements) against Max space (kB)
[17:45:14] - INFO: Fit result for fit function:
[17:45:14] - INFO: 	`result = p[0] * x + p[1]`
[17:45:14] - INFO: ------------------------------
  χ²      = 906.715    (4 DOF)
  χ²/dof  = 226.679
  NPAR    = 3
  NFREE   = 3
  NPEGGED = 0
  NITER   = 3
  NFEV    = 14
  P[0] = 0.409423 +/- 0.00569218
  P[1] = -17124.8 +/- 459.657
  P[2] = 1        +/- 0e+00
[17:45:14] - INFO: ------------------------------

Many plots are generated and stored in plotPath. We generate:

  • individual plots
    • of the raw data of each trace against each metric (user time, real time, space)
    • of the data and its fit against each metric with the fit parameters embedded into the figure
  • combined plots
    • for each metric a grid of each trace with the fit parameters. For the moment these do not include fit results, as it would be a bit too crowded (we can decide to add specific information that we deem important).

An example for an individual fit result (linear function):

./media/Main__permutation_trace_size__field_elements__1_23_25_with_fit.svg

An example for a combined grid plot of all traces for a single metric (log-linear function):

./media/all_traces_Max_space__kB__with_fit.svg

Comparison of two benchmark results

The tool supports comparing two input files for performance improvements or regressions.

This is done by using the additional --compare argument and passing in a second CSV file:

./benchmark_analyzer --raw -f <input_csv> --compare <comparison_csv>

A log file is written to ./logs/benchmark_analyzer.log as well as to stdout.

Build instructions

For the moment building the tool requires one to:

  1. install a recent version of Nim (as of writing <2025-02-04 Tue 18:43> 2.2 is the latest release). You can follow the installation instructions on the Nim website.
  2. install the ggplotnim dependencies (libcairo) following your operating system’s instructions to install (if there was a Windows user for this tool, follow this).
  3. use nimble to get all dependencies:
    nimble setup
        
  4. build the C shared library of mpfit, follow the instructions here: https://github.com/Vindaar/nim-mpfit?tab=readme-ov-file#dependencies–installation
  5. build the binary of the tool (produces the binary benchmark_analyzer in the directory of this repo):
    nimble install
        

    or manually via:

    nim c -d:release benchmark_analyzer
        
  6. (optional) add the directory of the repo to your PATH, move the binary or create a symlink of your choice.

NOTE: I might improve the install situation in the future by either automating the mpfit build process or replace mpfit by a native Nim implementation (nowadays we have a Levenberg-Marquardt implementation in numericalnim).

Important notes

  • currently the uncertainties for the metric (time or space) are hardcoded to 3% of the value. This does not really have any foundation! We need statistics to estimate realistic numbers!
  • starting parameters are just taken to 1 in all parameters. The underlying Levenberg-Marquardt non-linear least squares library used under the hood, mpfit, generally does a good job of finding good starting parameters.

Input data

At the moment the input data needs to be a CSV file of the raw benchmark output. An example file is:

./resources/valida-03-06-2025-07-36-b1f407b4ed2662a4932cf8d253893e9987b3c222_raw.csv

At the moment the information of the trace sizes is given in the following CSV file

./resources/trace_sizes.csv,

which is generated by

https://github.com/lita-xyz/valida-toolchain/blob/fb6368a2d40564a92259df4a1cda43819475b292/llvm-valida/benchmarking/get_trace_sizes.py

Possible future features

We can imagine to add a lot of interesting features in the future:

  • more detailed reporting of fit results (covariance matrix, …)
  • automatic report generation beyond a log file
  • [X] highlighting of outliers
  • generation of structured output data for further processing by another tool (e.g. for immediate reporting of performance regressions)
  • [X] statistical analyses of aggregates of multiple benchmark runs once we have statistics
  • bootstrap resampling of existing data
  • and probably lots more…

About

A tool to analyze benchmark results

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages