-
Notifications
You must be signed in to change notification settings - Fork 395
[Feature] Opt metrics structure #891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 133 commits
501c5ab
d261ba5
8538816
89493ac
f69ed2d
e3a44db
da0ad3d
cb85a3d
371beae
2b98563
134a901
ee12352
0af170c
cf7e2c0
eb51d12
38785ed
aeb5fd6
924f747
8fd556b
a78af4d
44c9635
7c08073
443022b
fbbce79
2b2edfc
a42a656
e7f3fae
2e87f70
93b53f8
d584e71
d359775
6ebe5ee
ab248bb
04230c8
fdfc9b5
b5d154a
274a784
43a266b
e4ff53e
13a87f2
bacd480
b339c38
935481c
2263dd1
f3b88b1
f0bdfaa
da7a271
bcd9ac4
abf941e
03afeaf
842af89
56ecac3
cbdac45
8ee59ce
48707f0
9d76475
2631578
e0ce96f
04da676
26e18b3
a8bcbc0
114a6a3
9126e68
0b905cf
3481dcc
7665b29
c78d420
2b37f16
78963fb
141d8f8
7c95eb9
6687f65
68074ac
ef34329
bff608c
a59c766
4918ab1
4976551
d646401
5efbd55
e83a338
55c11c1
a94349b
6e63657
9a31bae
232da73
13b0050
db0d866
21de7db
dd051b2
dd73daf
c1c48f9
4e6acbe
d3c6f54
bd6d8cd
654073f
c9068a7
6626d62
fe0e4b9
89f3944
3b311f4
4b39808
3fff139
b9c2d46
0bb732e
7c91e96
da335c7
5abc397
fb3bacf
51f5e0a
41db219
3a95be0
571f297
ca2cb26
48a519c
1bd59d8
24f8bc8
ef2d5d6
23a24ee
dd5d7b7
41c58d4
9145181
f1195f8
4383b01
41482ff
f07d070
764151d
f1b41d3
382327e
75be00c
3ffa4cd
e352716
7faa2e2
42f6f0f
e7c502f
a71fa64
fd9d3d4
00e7b78
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,156 @@ | ||
|
|
||
| # Metrics vLLM-Omni: | ||
|
|
||
| You can use these metrics in production to monitor the health and performance of the vLLM-omni system. Typical scenarios include: | ||
| - **Performance Monitoring**: Track throughput (e.g., `e2e_avg_tokens_per_s`), latency (e.g., `e2e_total_ms`), and resource utilization to verify that the system meets expected standards. | ||
| - **Debugging and Troubleshooting**: Use detailed per-request metrics to diagnose issues, such as high transfer times or unexpected token counts. | ||
|
|
||
| ## How to Enable and View Metrics | ||
|
|
||
| ### 1. Start the Service with Metrics Logging | ||
|
|
||
| ```bash | ||
| vllm serve /workspace/models/Qwen3-Omni-30B-A3B-Instruct --omni --port 8014 --log-stats | ||
| ``` | ||
|
|
||
| ### 2. Send a Request | ||
|
|
||
| ```bash | ||
| python openai_chat_completion_client_for_multimodal_generation.py --query-type use_image | ||
| ``` | ||
|
|
||
| ### 3. What You Will See | ||
|
|
||
| With `--log-stats` enabled, the server will output detailed metrics logs after each request. Example output: | ||
|
|
||
|
|
||
| #### Overall Summary | ||
|
|
||
| | Field | Value | | ||
| |-----------------------------|--------------| | ||
| | e2e_requests | 1 | | ||
| | e2e_wall_time_ms | 41,299.190 | | ||
| | e2e_total_tokens | 5,202 | | ||
| | e2e_avg_time_per_request_ms | 41,299.190 | | ||
| | e2e_avg_tokens_per_s | 125.959 | | ||
| | stage_wall_time_ms | 10,192.289, 30,541.409, 207.496 | | ||
|
|
||
| #### RequestE2EStats | ||
|
|
||
| | Field | Value | | ||
| |-------------------------|------------| | ||
| | e2e_total_ms | 41,299.133 | | ||
| | e2e_total_tokens | 5,202 | | ||
| | transfers_total_time_ms | 245.895 | | ||
| | transfers_total_kbytes | 138,089.939| | ||
|
|
||
| #### StageRequestStats | ||
|
|
||
| | Field | 0 | 1 | 2 | | ||
| |------------------------|--------|--------|--------| | ||
| | audio_generated_frames | 0 | 0 | 525,525| | ||
| | batch_id | 38 | 274 | 0 | | ||
| | batch_size | 1 | 1 | 1 | | ||
| | num_tokens_in | 4,860 | 4,826 | 4,384 | | ||
| | num_tokens_out | 67 | 275 | 0 | | ||
| | postprocess_time_ms | 256.158| 0.491 | 0.000 | | ||
| | stage_gen_time_ms | 9,910.007|30,379.198|160.745| | ||
|
|
||
| #### TransferEdgeStats | ||
|
|
||
| | Field | 0->1 | 1->2 | | ||
| |---------------------|-------------|------------| | ||
| | size_kbytes | 109,277.349 | 28,812.591 | | ||
| | tx_time_ms | 78.701 | 18.790 | | ||
| | rx_decode_time_ms | 111.865 | 31.706 | | ||
| | in_flight_time_ms | 2.015 | 2.819 | | ||
|
|
||
|
|
||
| These logs include: | ||
| - **Overall summary**: total requests, wall time, average tokens/sec, etc. | ||
| - **E2E table**: per-request latency and token counts. | ||
| - **Stage table**: per-stage batch and timing details. | ||
| - **Transfer table**: data transfer and timing for each edge. | ||
|
|
||
| You can use these logs to monitor system health, debug performance, and analyze request-level metrics as described above. | ||
|
|
||
| ## Parameter Details | ||
|
|
||
| | Field | Meaning | | ||
| |---------------------------|----------------------------------------------------------------------------------------------| | ||
| | `e2e_requests` | Number of completed requests. | | ||
| | `e2e_wall_time_ms` | Wall-clock time span from run start to last completion, in ms. | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be clarified that this applies to the offline case? For the online case, it actually tracks only individual requests (e2e requests always 1), rather than a summary over the entire online process. Can understood in this way?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes,ok |
||
| | `e2e_total_tokens` | Total tokens counted across all completed requests (stage0 input + all stage outputs). | | ||
| | `e2e_avg_time_per_request_ms` | Average wall time per request: `e2e_wall_time_ms / e2e_requests`. | | ||
| | `e2e_avg_tokens_per_s` | Average token throughput over wall time: `e2e_total_tokens * 1000 / e2e_wall_time_ms`. | | ||
| | `stage_wall_time_ms` | Wall-clock time span for each stage, in ms (list format). | | ||
LJH-LBJ marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| --- | ||
|
|
||
| ## E2E Table (per request) | ||
|
|
||
| | Field | Meaning | | ||
| |---------------------------|-----------------------------------------------------------------------| | ||
| | `e2e_total_ms` | End-to-end latency in ms. | | ||
| | `e2e_total_tokens` | Total tokens for the request (stage0 input + all stage outputs). | | ||
| | `transfers_total_time_ms` | Sum of transfer edge `total_time_ms` for this request. | | ||
| | `transfers_total_kbytes` | Sum of transfer kbytes for this request. | | ||
|
|
||
LJH-LBJ marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| --- | ||
|
|
||
| ## Stage Table (per stage event / request) | ||
|
|
||
| | Field | Meaning | | ||
| |---------------------------|-------------------------------------------------------------------------------------------------| | ||
| | `batch_id` | Batch index. | | ||
| | `batch_size` | Batch size. | | ||
| | `num_tokens_in` | Input tokens to the stage. | | ||
| | `num_tokens_out` | Output tokens from the stage. | | ||
| | `postprocess_time_ms` | Postprocessing time in ms. | | ||
| | `stage_gen_time_ms` | Stage compute time in ms, excluding postprocessing time (reported separately as `postprocess_time_ms`). | | ||
| | `image_num` | Number of images generated (for diffusion/image stages). | | ||
| | `resolution` | Image resolution (for diffusion/image stages). | | ||
| | `postprocess_time_ms` | Diffusion/image: post-processing time in ms. | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same with
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove postprocess_time_ms | Postprocessing time in ms |
||
| | `trajectory_timesteps` | Diffusion/image: trajectory timesteps, if available. | | ||
|
|
||
| --- | ||
|
|
||
| ## Transfer Table (per edge / request) | ||
|
|
||
| | Field | Meaning | | ||
| |----------------------|---------------------------------------------------------------------------| | ||
| | `size_kbytes` | Total kbytes transferred. | | ||
| | `tx_time_ms` | Sender transfer time in ms. | | ||
| | `rx_decode_time_ms` | Receiver decode time in ms. | | ||
| | `in_flight_time_ms` | In-flight time in ms. | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, about 90% cost by deserialize/serialize and shm_write/shm_read |
||
|
|
||
|
|
||
| ## Expectation of the Numbers (Verification) | ||
|
|
||
| **Formulas:** | ||
| - `e2e_total_tokens = Stage0's num_tokens_in + sum(all stages' num_tokens_out)` | ||
| - `transfers_total_time_ms = sum(tx_time_ms + rx_decode_time_ms + in_flight_time_ms)` for every edge | ||
|
|
||
| **Using the example above:** | ||
|
|
||
| ### e2e_total_tokens | ||
| - Stage0's `num_tokens_in`: **4,860** | ||
| - Stage0's `num_tokens_out`: **67** | ||
| - Stage1's `num_tokens_out`: **275** | ||
| - Stage2's `num_tokens_out`: **0** | ||
|
|
||
| So, | ||
| ``` | ||
| e2e_total_tokens = 4,860 + 67 + 275 + 0 = 5,202 | ||
| ``` | ||
| This matches the table value: `e2e_total_tokens = 5,202`. | ||
|
|
||
| ### transfers_total_time_ms | ||
| For each edge: | ||
| - 0->1: tx_time_ms (**78.701**) + rx_decode_time_ms (**111.865**) + in_flight_time_ms (**2.015**) = **192.581** | ||
| - 1->2: tx_time_ms (**18.790**) + rx_decode_time_ms (**31.706**) + in_flight_time_ms (**2.819**) = **53.315** | ||
|
|
||
| Sum: 192.581 + 53.315 = **245.896** | ||
|
|
||
| The table shows `transfers_total_time_ms = 245.895`, which matches the calculation (difference is due to rounding). | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,3 +12,4 @@ torchsde>=0.2.6 | |
| openai-whisper>=20250625 | ||
| imageio[ffmpeg]>=2.37.2 | ||
| sox>=1.5.0 | ||
| prettytable>=3.8.0 | ||

Uh oh!
There was an error while loading. Please reload this page.