Skip to content

A repository for benchmarking HF Optimum's optimizations for inference and training.

License

Notifications You must be signed in to change notification settings

aoowweenn/optimum-benchmark

 
 

Repository files navigation

Optimum-Benchmark (🚧 WIP 🚧)

The Goal

A repository aiming to create a benchmarking utility for any model on HuggingFace's Hub supporting Optimum's inference & training, optimizations & quantizations, on different backends & hardwares (OnnxRuntime, Intel Neural Compressor, OpenVINO, Habana Gaudi Processor (HPU), etc).

The experiment management and tracking is handled by hydra using the command line with minimum configuration changes and maximum flexibility (inspired from tune).

Motivation

  • Many users would want to know how their chosen model performs (latency & throughput) before deploying it to production.
  • Many hardware vendors would want to know how their hardware performs on different models and how it compares to others.
  • Optimum offers a lot of optimizations that can be applied to models and improve their performance, but it's hard to know which ones to use if you don't know a lot about your hardware. It's also hard to estimate how much these optimizations will improve the performance before training your model or downloading it from the hub and optimizing it.
  • Benchmarks depend heavily on many factors, like the machine/hardware/os/releases/etc but most of this information is not put forward with the results. And that makes most of the benchmarks available today, not very useful for decision making.
  • [...]

Features

General:

  • Latency and throughput tracking (default behavior)
  • Peak memory tracking (benchmark.memory=true)
  • Symbolic Profiling (benchmark.profile=true)
  • Input shapes control (e.g. benchmark.input_shapes.batch_size=8)
  • Random weights initialization (backend.no_weights=true support depends on backend)

Inference:

  • Pytorch backend for CPU
  • Pytorch backend for CUDA
  • Pytorch backend for Habana Gaudi Processor (HPU)
  • OnnxRuntime backend for CPUExecutionProvider
  • OnnxRuntime backend for CUDAExecutionProvider
  • Intel Neural Compressor backend for CPU
  • OpenVINO backend for CPU

Optimizations:

  • Pytorch's Automatic Mixed Precision
  • Optimum's BetterTransformer
  • Optimum's Optimization and AutoOptimization
  • Optimum's Quantization and AutoQuantization
  • Optimum's Calibration for Static Quantization
  • BitsAndBytes' quantization

Quickstart

Start by installing the required dependencies depending on your hardware and the backends you want to use. For example, if you're gonna be running some GPU benchmarks, you can install the requirements with:

python -m pip install -r gpu_requirements.txt

Then install the package:

python -m pip install -e .

You can now run a benchmark using the command line by specifying the configuration directory and the configuration name. Both arguments are mandatory. The config-dir is the directory where the configuration files are stored and the config-name is the name of the configuration file without the .yaml extension.

optimum-benchmark --config-dir examples/ --config-name pytorch

This will run the benchmark using the configuration in examples/pytorch.yaml and store the results in runs/pytorch.

The result files are inference_results.csv, the program's logs main.log and the configuration that's been used hydra_config.yaml

The directory for storing these results can be changed using the hydra.run.dir (and/or hydra.sweep.dir in case of a multirun) in the command line or in the config file (see base_config.yaml).

Command-line configuration overrides

It's easy to override the default behavior of a benchmark from the command line.

optimum-benchmark --config-dir examples/ --config-name pytorch model=gpt2 device=cuda:1

Multirun configuration sweeps

You can easily run configuration sweeps using the -m or --multirun option. By default, configurations will be executed serially but other kinds of executions are supported with hydra's launcher plugins : hydra/launcher=submitit, hydra/launcher=rays, etc.

optimum-benchmark --config-dir examples --config-name pytorch -m device=cpu,cuda

Also, for integer parameters like batch_size, one can specify a range of values to sweep over:

optimum-benchmark --config-dir examples --config-name pytorch -m device=cpu,cuda benchmark.input_shapes.batch_size='range(1,10,step=2)'

Reporting benchamrk results (WIP)

To aggregate the results of a benchmark (run(s) or sweep(s)), you can use the optimum-report command.

optimum-report --experiments {experiments_folder_1} {experiments_folder_2} --baseline {baseline_folder} --report-name {report_name}

This will create a report in the reports folder with the name {report_name}. The report will contain the results of the experiments in {experiments_folder_1} and {experiments_folder_2} compared to the results of the baseline in {baseline_folder} in the form of a .csv file, an .svg rich table and (a) .png plot(s).

Configurations structure

You can create custom configuration files following the examples here. The easiest way to do so is by using hydra's composition with a base configuratin examples/base_config.yaml.

To create a configuration that uses a wav2vec2 model and onnxruntime backend, it's as easy as:

defaults:
  - base_config
  - _self_
  - override backend: onnxruntime

experiment_name: onnxruntime_wav2vec2

model: bookbot/distil-wav2vec2-adult-child-cls-37m
device: cpu

Some examples are provided in the tests/configs folder for different backends and models.

TODO

  • Add support for any kind of input (text, audio, image, etc.)
  • Add support for onnxruntime backend
  • Add support for optimum quantization
  • Add support for omptimum graph optimizations
  • Add support for static quantization + calibration.
  • Add support for profiling nodes/kernels execution time.
  • Add experiments aggregator to report on data from different runs/sweeps.
  • Add support for sweepers latency optimization (optuna, nevergrad, etc.)
  • Add support for more metrics (memory usage, node execution time, etc.)
  • Migrate configuration management to be handled solely by config store.
  • Add Dana client to send results to the dashboard (WIP)
  • Make a consistent reporting utility.
  • Add Pydantic for schema validation.
  • Add support for sparse inputs.
  • ...

About

A repository for benchmarking HF Optimum's optimizations for inference and training.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.2%
  • Shell 0.8%