Releases · eugr/llama-benchy · GitHub

12 Mar 06:16

eugr

v0.3.5 - model autodetection Latest

Latest

Changes in this release:

Added model name autodetection (if --model is omitted)
Updated tokenizer argument description

Assets 4

02 Mar 23:02

eugr

v0.3.4 - minor improvements

Release Summary for v0.3.4:

Changes:

Added --exit-on-first-fail and --no-results-on-fail flags
Require transformers 5.2.x for more tokenizer choices

Assets 4

27 Feb 04:23

eugr

v0.3.3 - bugfixes

Fixes

Fixed coherence test crash when API returns null content field (e.g. some thinking models)
Print partial results when interrupted

Assets 4

26 Feb 18:43

eugr

v0.3.2 - coherence test

Added skippable model coherence testing that runs before benchmarks.

Assets 4

16 Feb 05:17

eugr

v0.3.1 - SGLang fixes

Fixed an issue with SGLang not providing prompt tokens in usage stats
Removed per-requests stats from Markdown table when concurrency == 1

Assets 4

07 Feb 20:25

eugr

v0.3.0 - JSON improvements

Added an option to include total throughput and per request time series data to JSON output.
Added JSON schema.
Added sample JSON file with embedded documentation (in JSONC format).
Added peak t/s to the output (for total throughput and per request).
Added a sample Jupyter Notebook with visualization

Assets 4

06 Feb 01:02

eugr

v0.2.1 - improvements

Added peak throughput
Improved output t/s accuracy

Assets 4

05 Feb 23:36

eugr

v0.2.0 - major update

A major update. Brings the following functionality:

Added concurrency testing
Added JSON and CSV output formats
Added ability to save benchmarks to file

Assets 2

01 Feb 08:00

eugr

v0.1.2 - Improved compatibility with Nemotron models

Support different reasoning headers for models like Nemotron Nano.

Assets 4

07 Jan 00:13

eugr

v0.1.1 - cosmetic changes

Fixed version numbering

Assets 4