Skip to content

Releases: eugr/llama-benchy

v0.3.5 - model autodetection

12 Mar 06:16

Choose a tag to compare

Changes in this release:

  • Added model name autodetection (if --model is omitted)
  • Updated tokenizer argument description

v0.3.4 - minor improvements

02 Mar 23:02

Choose a tag to compare

Release Summary for v0.3.4:

Changes:

  • Added --exit-on-first-fail and --no-results-on-fail flags
  • Require transformers 5.2.x for more tokenizer choices

v0.3.3 - bugfixes

27 Feb 04:23

Choose a tag to compare

Fixes

  • Fixed coherence test crash when API returns null content field (e.g. some thinking models)
  • Print partial results when interrupted

v0.3.2 - coherence test

26 Feb 18:43

Choose a tag to compare

Added skippable model coherence testing that runs before benchmarks.

v0.3.1 - SGLang fixes

16 Feb 05:17

Choose a tag to compare

  • Fixed an issue with SGLang not providing prompt tokens in usage stats
  • Removed per-requests stats from Markdown table when concurrency == 1

v0.3.0 - JSON improvements

07 Feb 20:25

Choose a tag to compare

  • Added an option to include total throughput and per request time series data to JSON output.
  • Added JSON schema.
  • Added sample JSON file with embedded documentation (in JSONC format).
  • Added peak t/s to the output (for total throughput and per request).
  • Added a sample Jupyter Notebook with visualization

v0.2.1 - improvements

06 Feb 01:02

Choose a tag to compare

  • Added peak throughput
  • Improved output t/s accuracy

v0.2.0 - major update

05 Feb 23:36

Choose a tag to compare

A major update. Brings the following functionality:

  • Added concurrency testing
  • Added JSON and CSV output formats
  • Added ability to save benchmarks to file

v0.1.2 - Improved compatibility with Nemotron models

01 Feb 08:00

Choose a tag to compare

Support different reasoning headers for models like Nemotron Nano.

v0.1.1 - cosmetic changes

07 Jan 00:13

Choose a tag to compare

Fixed version numbering