Releases: eugr/llama-benchy
Releases · eugr/llama-benchy
v0.3.5 - model autodetection
v0.3.4 - minor improvements
Release Summary for v0.3.4:
Changes:
- Added --exit-on-first-fail and --no-results-on-fail flags
- Require transformers 5.2.x for more tokenizer choices
v0.3.3 - bugfixes
Fixes
- Fixed coherence test crash when API returns null content field (e.g. some thinking models)
- Print partial results when interrupted
v0.3.2 - coherence test
Added skippable model coherence testing that runs before benchmarks.
v0.3.1 - SGLang fixes
- Fixed an issue with SGLang not providing prompt tokens in usage stats
- Removed per-requests stats from Markdown table when concurrency == 1
v0.3.0 - JSON improvements
- Added an option to include total throughput and per request time series data to JSON output.
- Added JSON schema.
- Added sample JSON file with embedded documentation (in JSONC format).
- Added peak t/s to the output (for total throughput and per request).
- Added a sample Jupyter Notebook with visualization
v0.2.1 - improvements
- Added peak throughput
- Improved output t/s accuracy
v0.2.0 - major update
A major update. Brings the following functionality:
- Added concurrency testing
- Added JSON and CSV output formats
- Added ability to save benchmarks to file
v0.1.2 - Improved compatibility with Nemotron models
Support different reasoning headers for models like Nemotron Nano.
v0.1.1 - cosmetic changes
Fixed version numbering