-
Notifications
You must be signed in to change notification settings - Fork 395
[Feature] Opt metrics structure #891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 65 commits
501c5ab
d261ba5
8538816
89493ac
f69ed2d
e3a44db
da0ad3d
cb85a3d
371beae
2b98563
134a901
ee12352
0af170c
cf7e2c0
eb51d12
38785ed
aeb5fd6
924f747
8fd556b
a78af4d
44c9635
7c08073
443022b
fbbce79
2b2edfc
a42a656
e7f3fae
2e87f70
93b53f8
d584e71
d359775
6ebe5ee
ab248bb
04230c8
fdfc9b5
b5d154a
274a784
43a266b
e4ff53e
13a87f2
bacd480
b339c38
935481c
2263dd1
f3b88b1
f0bdfaa
da7a271
bcd9ac4
abf941e
03afeaf
842af89
56ecac3
cbdac45
8ee59ce
48707f0
9d76475
2631578
e0ce96f
04da676
26e18b3
a8bcbc0
114a6a3
9126e68
0b905cf
3481dcc
7665b29
c78d420
2b37f16
78963fb
141d8f8
7c95eb9
6687f65
68074ac
ef34329
bff608c
a59c766
4918ab1
4976551
d646401
5efbd55
e83a338
55c11c1
a94349b
6e63657
9a31bae
232da73
13b0050
db0d866
21de7db
dd051b2
dd73daf
c1c48f9
4e6acbe
d3c6f54
bd6d8cd
654073f
c9068a7
6626d62
fe0e4b9
89f3944
3b311f4
4b39808
3fff139
b9c2d46
0bb732e
7c91e96
da335c7
5abc397
fb3bacf
51f5e0a
41db219
3a95be0
571f297
ca2cb26
48a519c
1bd59d8
24f8bc8
ef2d5d6
23a24ee
dd5d7b7
41c58d4
9145181
f1195f8
4383b01
41482ff
f07d070
764151d
f1b41d3
382327e
75be00c
3ffa4cd
e352716
7faa2e2
42f6f0f
e7c502f
a71fa64
fd9d3d4
00e7b78
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| # Production Metrics | ||
|
|
||
| ## Usage | ||
|
|
||
| Users can utilize these metrics in production environments to monitor the health and performance of the vLLM-omni system. Key scenarios include: | ||
| - **Performance Monitoring**: Track throughput (e.g., `e2e_avg_tokens_per_s`), latency (e.g., `e2e_total_ms`), and resource utilization to verify that the system meets expected performance standards. | ||
| - **Debugging and Troubleshooting**: Use detailed per-request metrics to diagnose issues with specific requests, such as high transfer times or unexpected token counts. | ||
| - **Enable Logging**: Start vLLM-omni with the `--log-stats` flag. This exposes metrics through structured logs. | ||
|
|
||
| ## Overall Summary | ||
|
||
|
|
||
| | Field | Meaning | | ||
| |---------------------------|----------------------------------------------------------------------------------------------| | ||
| | `e2e_requests` | Number of completed requests. | | ||
| | `e2e_wall_time_ms` | Wall-clock time span from run start to last completion, in ms. | | ||
| | `e2e_total_tokens` | Total tokens counted across all completed requests (stage0 input + all stage outputs). | | ||
| | `e2e_avg_time_per_request_ms` | Average wall time per request: `e2e_wall_time_ms / e2e_requests`. | | ||
| | `e2e_avg_tokens_per_s` | Average token throughput over wall time: `e2e_total_tokens * 1000 / e2e_wall_time_ms`. | | ||
| | `stage_wall_time_ms` | Wall-clock time span for each stage, in ms (list format). | | ||
|
|
||
| --- | ||
|
|
||
| ## E2E Table (per request) | ||
|
|
||
| | Field | Meaning | | ||
| |---------------------------|-----------------------------------------------------------------------| | ||
| | `e2e_total_ms` | End-to-end latency in ms. | | ||
| | `e2e_total_tokens` | Total tokens for the request (stage0 input + all stage outputs). | | ||
| | `transfers_total_time_ms` | Sum of transfer edge `total_time_ms` for this request. | | ||
| | `transfers_total_kbytes` | Sum of transfer kbytes for this request. | | ||
|
|
||
|
|
||
| --- | ||
|
|
||
| ## Stage Table (per stage event / request) | ||
|
|
||
| | Field | Meaning | | ||
| |---------------------|-------------------------------------------------------------------------------------| | ||
| | `batch_id` | Batch index. | | ||
| | `batch_size` | Batch size. | | ||
| | `num_tokens_in` | Input tokens to the stage. | | ||
| | `num_tokens_out` | Output tokens from the stage. | | ||
| | `preprocess_time_ms` | Preprocessing time in ms. | | ||
| | `stage_gen_time_ms` | Stage compute time in ms, excluding preprocessing time (reported separately as `preprocess_time_ms`). | | ||
LJH-LBJ marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| --- | ||
|
|
||
| ## Transfer Table (per edge / request) | ||
|
|
||
| | Field | Meaning | | ||
| |----------------------|---------------------------------------------------------------------------| | ||
| | `size_kbytes` | Total kbytes transferred. | | ||
| | `tx_time_ms` | Sender transfer time in ms. | | ||
| | `rx_decode_time_ms` | Receiver decode time in ms. | | ||
| | `in_flight_time_ms` | In-flight time in ms. | | ||
|
|
||
| ## Expectation of the numbers: | ||
|
||
| e2e_total_tokens = Stage0 's num_tokens_in + other stage's num_tokens_out | ||
| transfers_total_time_ms = sum(tx_time_ms + rx_decode_time_ms + in_flight_time_ms) in every edge | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,122 @@ | ||
| from __future__ import annotations | ||
hsliuustc0106 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| from vllm_omni.metrics import OrchestratorAggregator | ||
| from vllm_omni.metrics.stats import RequestE2EStats | ||
|
|
||
|
|
||
| def _get_request_entry(table: list[dict], request_id: str) -> dict: | ||
| for entry in table: | ||
| if entry.get("request_id") == request_id: | ||
| return entry | ||
| raise AssertionError(f"request_id={request_id} not found") | ||
|
|
||
|
|
||
| def test_orchestrator_aggregator_builds_summary() -> None: | ||
| agg = OrchestratorAggregator(num_stages=2, log_stats=False, wall_start_ts=0.0) | ||
| agg.stage_first_ts[0] = 0.0 | ||
| agg.stage_last_ts[0] = 0.03 | ||
| agg.stage_first_ts[1] = 0.05 | ||
| agg.stage_last_ts[1] = 0.07 | ||
|
|
||
| agg.on_forward(0, 1, "r1", size_bytes=1024, tx_ms=5.0, used_shm=False) | ||
| agg.on_stage_metrics( | ||
| 0, | ||
| "r1", | ||
| { | ||
| "num_tokens_in": 3, | ||
| "num_tokens_out": 3, | ||
| "stage_gen_time_ms": 30.0, | ||
| "batch_id": 1, | ||
| "batch_size": 1, | ||
| "rx_transfer_bytes": 0, | ||
| "rx_decode_time_ms": 0.0, | ||
| }, | ||
| ) | ||
| agg.on_stage_metrics( | ||
| 1, | ||
| "r1", | ||
| { | ||
| "num_tokens_out": 4, | ||
| "stage_gen_time_ms": 20.0, | ||
| "batch_id": 1, | ||
| "batch_size": 1, | ||
| "rx_transfer_bytes": 1024, | ||
| "rx_decode_time_ms": 5.0, | ||
| "rx_in_flight_time_ms": 2.0, | ||
| }, | ||
| ) | ||
| agg.on_finalize_request(1, "r1", req_start_ts=0.0) | ||
|
|
||
| summary = agg.build_and_log_summary(final_stage_id_to_prompt={"r1": 1}) | ||
| overall = summary["overall_summary"] | ||
| assert overall["e2e_requests"] == 1 | ||
|
|
||
| stage_entry = _get_request_entry(summary["stage_table"], "r1") | ||
| stage_ids = [row["stage_id"] for row in stage_entry["stages"]] | ||
| assert stage_ids == [0, 1] | ||
|
|
||
| transfer_entry = _get_request_entry(summary["trans_table"], "r1") | ||
| assert transfer_entry["transfers"][0]["edge"] == "0->1" | ||
| assert transfer_entry["transfers"][0]["size_kbytes"] == 1.0 | ||
|
|
||
| e2e_entry = _get_request_entry(summary["e2e_table"], "r1") | ||
| assert e2e_entry["e2e_total_tokens"] == 10 | ||
|
|
||
|
|
||
| def test_build_and_log_summary_e2e_only() -> None: | ||
| agg = OrchestratorAggregator(num_stages=1, log_stats=False, wall_start_ts=0.0) | ||
| agg.e2e_events.append( | ||
| RequestE2EStats( | ||
| request_id="r", | ||
| e2e_total_ms=10.0, | ||
| e2e_total_tokens=5, | ||
| transfers_total_time_ms=0.0, | ||
| transfers_total_bytes=0, | ||
| ) | ||
| ) | ||
|
|
||
| summary = agg.build_and_log_summary(final_stage_id_to_prompt=0) | ||
| e2e_entry = _get_request_entry(summary["e2e_table"], "r") | ||
| assert e2e_entry["e2e_total_tokens"] == 5 | ||
| stage_entry = _get_request_entry(summary["stage_table"], "r") | ||
| assert stage_entry["stages"] == [] | ||
|
|
||
|
|
||
| def test_build_and_log_summary_multiple_requests() -> None: | ||
| agg = OrchestratorAggregator(num_stages=1, log_stats=False, wall_start_ts=0.0) | ||
|
|
||
| agg.on_stage_metrics( | ||
| 0, | ||
| "r1", | ||
| { | ||
| "num_tokens_in": 2, | ||
| "num_tokens_out": 4, | ||
| "batch_id": 1, | ||
| "batch_size": 1, | ||
| "stage_gen_time_ms": 10.0, | ||
| "rx_transfer_bytes": 0, | ||
| "rx_decode_time_ms": 0.0, | ||
| "rx_in_flight_time_ms": 0.0, | ||
| }, | ||
| ) | ||
| agg.on_finalize_request(0, "r1", req_start_ts=0.0) | ||
|
|
||
| agg.on_stage_metrics( | ||
| 0, | ||
| "r2", | ||
| { | ||
| "num_tokens_in": 1, | ||
| "num_tokens_out": 2, | ||
| "batch_id": 2, | ||
| "batch_size": 1, | ||
| "stage_gen_time_ms": 12.0, | ||
| "rx_transfer_bytes": 0, | ||
| "rx_decode_time_ms": 0.0, | ||
| "rx_in_flight_time_ms": 0.0, | ||
| }, | ||
| ) | ||
| agg.on_finalize_request(0, "r2", req_start_ts=0.0) | ||
|
|
||
| summary = agg.build_and_log_summary(final_stage_id_to_prompt=0) | ||
| assert len(summary["stage_table"]) == 2 | ||
| assert {entry["request_id"] for entry in summary["e2e_table"]} == {"r1", "r2"} | ||
Uh oh!
There was an error while loading. Please reload this page.