Release GuideLLM v0.5.0 · vllm-project/guidellm

Overview

GuideLLM v0.5.0 is a small release adding request throttling when server is over-saturated. The release re-introduces features for dataset preprocessing and fixes various issues introduced in v0.4.0.

To get started, install with:

pip install guidellm[recommended]==0.5.0

Or from source with:

pip install 'guidellm[recommended] @ git+https://github.com/vllm-project/guidellm.git'@v0.5.0

Breaking Changes

Throughput Mode: Throughput mode previously assumed a fixed rate.
- Migration: Throughput mode now requires manually specifying --rate if used outside of sweep mode.

What's New

Comprehensive Preprocessing Guide: Major documentation update covering preprocessing configs, strategies for handling short prompts, advanced column mapping, and reproducibility controls.
Over-Saturation Detection: Automatic benchmark stopping when LLM servers are overloaded with a fine-tunable saturation detection and in-depth docs.

What's Changed

Dataset Preprocessing Command: Re-enable guidellm preprocess dataset command for custom prompt/output token sizing, column mapping, batch preprocessing, prompt strategies, and huggingface uploads.
Benchmark CLI: Added --detect-saturation and --over-saturation for robust specification of saturation constraints.

What's Fixed

Assorted documentation fixes: Documentation polish and outdated references removed.
Dataset handler: Fixed edge case where finite length datasets would stall out when exhausted.
Connection limit: Uncapped limits on number of connections per worker. Previously fixed at 100.

Compatibility Notes

Python: 3.10–3.13
OS: Linux, MacOS

Changelog

Bug fixes

Unmask StopIteration in DataLoader by @sjmonson in #468
fix encode_audio with dict input failure by @tukwila in #480
ut for audio and vision encode function by @tukwila in #489
Allow unlimited connections per-worker by @sjmonson in #488

New features

Add over saturation constraint by @AlonKellner-RedHat in #438
Add more metadata to benchmark report by @sjmonson in #497
Add vllm id to the response by @toslali-ibm in #455
Indicate max_concurrency for throughput and disallow running standalone without --rate by @sjmonson in #467
Reenable and improve preprocess dataset by @jaredoconnell in #472

CI, Workflows & Packaging

Fix container version mismatch by moving build type ARG after FROM by @yankay in #473
update build versions in settings, workflow versioned build fix by @DaltheCow in #465
Fixes and Cleanup of CI by @sjmonson in #469
Fix test failures due to collecting 0 tests by @sjmonson in #485
Fix container overriding output-dir/outputs by @sjmonson in #486
Switch to uv in build, test, and CI by @sjmonson in #494
Bump some old dependency locks by @sjmonson in #496
UT for src/guidellm/data/deserializers/file.py by @tukwila in #495

New Contributors

@toslali-ibm made their first contribution in #455
@yankay made their first contribution in #473

Full Changelog: v0.4.0...v0.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GuideLLM v0.5.0

Choose a tag to compare

Sorry, something went wrong.