GuideLLM v0.5.0
Overview
GuideLLM v0.5.0 is a small release adding request throttling when server is over-saturated. The release re-introduces features for dataset preprocessing and fixes various issues introduced in v0.4.0.
To get started, install with:
pip install guidellm[recommended]==0.5.0Or from source with:
pip install 'guidellm[recommended] @ git+https://github.com/vllm-project/guidellm.git'@v0.5.0Breaking Changes
- Throughput Mode: Throughput mode previously assumed a fixed rate.
- Migration: Throughput mode now requires manually specifying
--rateif used outside of sweep mode.
- Migration: Throughput mode now requires manually specifying
What's New
- Comprehensive Preprocessing Guide: Major documentation update covering preprocessing configs, strategies for handling short prompts, advanced column mapping, and reproducibility controls.
- Over-Saturation Detection: Automatic benchmark stopping when LLM servers are overloaded with a fine-tunable saturation detection and in-depth docs.
What's Changed
- Dataset Preprocessing Command: Re-enable
guidellm preprocess datasetcommand for custom prompt/output token sizing, column mapping, batch preprocessing, prompt strategies, and huggingface uploads. - Benchmark CLI: Added
--detect-saturationand--over-saturationfor robust specification of saturation constraints.
What's Fixed
- Assorted documentation fixes: Documentation polish and outdated references removed.
- Dataset handler: Fixed edge case where finite length datasets would stall out when exhausted.
- Connection limit: Uncapped limits on number of connections per worker. Previously fixed at 100.
Compatibility Notes
- Python: 3.10–3.13
- OS: Linux, MacOS
Changelog
Bug fixes
- Unmask StopIteration in DataLoader by @sjmonson in #468
- fix encode_audio with dict input failure by @tukwila in #480
- ut for audio and vision encode function by @tukwila in #489
- Allow unlimited connections per-worker by @sjmonson in #488
New features
- Add over saturation constraint by @AlonKellner-RedHat in #438
- Add more metadata to benchmark report by @sjmonson in #497
- Add vllm id to the response by @toslali-ibm in #455
- Indicate max_concurrency for throughput and disallow running standalone without --rate by @sjmonson in #467
- Reenable and improve preprocess dataset by @jaredoconnell in #472
CI, Workflows & Packaging
- Fix container version mismatch by moving build type ARG after FROM by @yankay in #473
- update build versions in settings, workflow versioned build fix by @DaltheCow in #465
- Fixes and Cleanup of CI by @sjmonson in #469
- Fix test failures due to collecting 0 tests by @sjmonson in #485
- Fix container overriding output-dir/outputs by @sjmonson in #486
- Switch to uv in build, test, and CI by @sjmonson in #494
- Bump some old dependency locks by @sjmonson in #496
- UT for src/guidellm/data/deserializers/file.py by @tukwila in #495
New Contributors
- @toslali-ibm made their first contribution in #455
- @yankay made their first contribution in #473
Full Changelog: v0.4.0...v0.5.0