-
Notifications
You must be signed in to change notification settings - Fork 92
[GuideLLM Refactor] Updates and Fixes for benchmark outputs, schemas, and stats calculations #442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR performs a significant refactoring to reorganize the codebase's statistics, schemas, and benchmarking components. The main changes consolidate utilities into schemas, remove deprecated presentation modules, and restructure the test suite to align with the new organization.
Key changes:
- Moved statistics utilities from
utils/statistics.pytoschemas/statistics.pyand updated to use probability density functions (PDFs) instead of cumulative distribution functions (CDFs) - Relocated pydantic utilities from
utils/pydantic_utils.pytoschemas/base.py - Removed deprecated presentation modules (
injector.py,data_models.py,builder.py) - Complete rewrite of statistics tests to use parametrized fixtures and broader distribution coverage
- Added new console utilities for formatted table printing
Reviewed Changes
Copilot reviewed 59 out of 61 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/utils/test_statistics.py | Complete rewrite with parametrized fixtures testing multiple probability distributions |
| tests/unit/utils/test_pydantic_utils.py | Updated import path from utils to schemas |
| tests/unit/presentation/* | Removed deprecated presentation test files |
| tests/unit/mock_benchmark.py | Updated class name from BenchmarkSchedulerStats to BenchmarkSchedulerMetrics |
| src/guidellm/utils/statistics.py | Deleted - moved to schemas/statistics.py |
| src/guidellm/schemas/statistics.py | New file with refactored statistics using PDF-based approach |
| src/guidellm/schemas/base.py | New file consolidating pydantic utilities |
| src/guidellm/utils/console.py | Enhanced with table printing capabilities and improved documentation |
| src/guidellm/utils/functions.py | Added safe_format_number utility |
| Multiple schema files | Updated imports and restructured benchmark schemas |
Comments suppressed due to low confidence (3)
src/guidellm/data/preprocessors/formatters.py:44
- This class does not call RequestFormatter.init during initialization. (GenerativeTextCompletionsRequestFormatter.init may be missing a call to a base class init)
class GenerativeTextCompletionsRequestFormatter(RequestFormatter):
src/guidellm/data/preprocessors/formatters.py:118
- This class does not call RequestFormatter.init during initialization. (GenerativeChatCompletionsRequestFormatter.init may be missing a call to a base class init)
class GenerativeChatCompletionsRequestFormatter(RequestFormatter):
src/guidellm/data/preprocessors/formatters.py:307
- This class does not call RequestFormatter.init during initialization. (GenerativeAudioTranscriptionRequestFormatter.init may be missing a call to a base class init)
class GenerativeAudioTranscriptionRequestFormatter(RequestFormatter):
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Ran this benchmark: guidellm benchmark
--target http://vllm-standalone-granite-3-2b.llmd.svc.cluster.local \
--data "prompt_tokens=4096,prompt_tokens_stdev=512,prompt_tokens_min=2048,prompt_tokens_max=6144,output_tokens=512,output_tokens_stdev=128,output_tokens_min=1,output_tokens_max=1024" \
--max-seconds 10 \
--profile concurrent \
--rate 10And got the following error: ...
File "/root/guidellm/src/guidellm/benchmark/benchmarker.py", line 161, in run
benchmark = benchmark_class.compile(
accumulator=accumulator,
scheduler_state=scheduler_state,
)
File "/root/guidellm/src/guidellm/benchmark/schemas/generative/benchmark.py", line 134, in compile
metrics=GenerativeMetrics.compile(accumulator),
~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "/root/guidellm/src/guidellm/benchmark/schemas/generative/metrics.py", line 797, in compile
incomplete = accumulator.incomplete.get_within_range(start_time, end_time)
File "/root/guidellm/src/guidellm/benchmark/schemas/generative/accumulator.py", line 623, in get_within_range
if (stats.request_end_time >= start_time)
^^^^^^^^^^^^^^^^^^^^^^
File "/root/guidellm/src/guidellm/schemas/request_stats.py", line 81, in request_end_time
raise ValueError("resolve_end timings should be set but is None.")
ValueError: resolve_end timings should be set but is None.Edit: Seems to be due to max-seconds constraint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the max-seconds bug here are some high-level notes:
- Needs signoff
- Needs cleanup. To retroactively run pre-commit on only the changes:
pre-commit run --from $(git merge-base main@{u} HEAD) --to HEAD - Only glanced over the accumulator and statistics code. Will do a more in-depth review time permitting but don't block on it.
- Is warmup/cooldown only seconds now and not sometimes a percent? Setting
--warmup .1results in a table entry ofWarm Sec: 0.1.
bfc4b2f to
4a951c4
Compare
…or the latest state of refactor Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
…ons, metrics, and outputs Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
Co-authored-by: Samuel Monson <[email protected]> Signed-off-by: Mark Kurtz <[email protected]> Signed-off-by: Mark Kurtz <[email protected]>
Signed-off-by: Mark Kurtz <[email protected]>
73e7539 to
6c7f133
Compare
Signed-off-by: Mark Kurtz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that the max-seconds error is fixed. I'll do more testing tomorrow, but it works for me.
The new CLI table outputs are a little busy to look at, but work. Adding more horizontal padding may help, but isn't necessary.
| self._add_field( | ||
| headers, | ||
| values, | ||
| "Run Info", | ||
| "Requests", | ||
| json.dumps(benchmark.config.requests), | ||
| ) | ||
| self._add_field( | ||
| headers, values, "Run Info", "Backend", json.dumps(benchmark.config.backend) | ||
| ) | ||
| self._add_field( | ||
| headers, | ||
| values, | ||
| "Run Info", | ||
| "Environment", | ||
| json.dumps(benchmark.config.environment), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requests and environment values are not the most pretty, but it works and doesn't exclude any information.
The format doesn't preserve the format of the original input, like it does on main right now, but if that's okay then I'm okay with proceeding.
| worker = WorkerProcess[RequestT, ResponseT]( | ||
| worker_index=rank, | ||
| messaging=self.messaging.create_worker_copy( | ||
| messaging=self.messaging.create_worker_copy( # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may make sense to explain this ignored type error. If someone tries to fix it, this one is a confusing one.
Summary
Details
Test Plan
Related Issues
Use of AI
## WRITTEN BY AI ##)