-
Notifications
You must be signed in to change notification settings - Fork 395
[Feature] Opt metrics structure #891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
LJH-LBJ
wants to merge
138
commits into
vllm-project:main
Choose a base branch
from
LJH-LBJ:opt_metrics_structure
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,154
−727
Open
Changes from 118 commits
Commits
Show all changes
138 commits
Select commit
Hold shift + click to select a range
501c5ab
opt metrics structure
LJH-LBJ d261ba5
opt loggers
LJH-LBJ 8538816
Merge branch 'vllm-project:main' into opt_metrics_structure
LJH-LBJ 89493ac
opt metrics structure
LJH-LBJ f69ed2d
fix bug
LJH-LBJ e3a44db
fix bug
LJH-LBJ da0ad3d
fix bug
LJH-LBJ cb85a3d
fix bug
LJH-LBJ 371beae
fix bug
LJH-LBJ 2b98563
opt loggers
LJH-LBJ 134a901
opt metrics structure
LJH-LBJ ee12352
opt format
LJH-LBJ 0af170c
Merge branch 'vllm-project:main' into opt_metrics_structure
LJH-LBJ cf7e2c0
opt metrics structure
LJH-LBJ eb51d12
opt metrics structure
LJH-LBJ 38785ed
opt metrics structure
LJH-LBJ aeb5fd6
opt test
LJH-LBJ 924f747
opt loggers
LJH-LBJ 8fd556b
Merge branch 'vllm-project:main' into opt_metrics_structure
LJH-LBJ a78af4d
fix bug
LJH-LBJ 44c9635
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ 7c08073
fix bug
LJH-LBJ 443022b
fix bug
LJH-LBJ fbbce79
fix bug
LJH-LBJ 2b2edfc
fix bug
LJH-LBJ a42a656
opt metrics in offline
LJH-LBJ e7f3fae
fix bug
LJH-LBJ 2e87f70
fix bug
LJH-LBJ 93b53f8
fix pre-commit
LJH-LBJ d584e71
fix bug
LJH-LBJ d359775
fix bug
LJH-LBJ 6ebe5ee
Merge branch 'main' into opt_metrics_structure
LJH-LBJ ab248bb
Merge remote-tracking branch 'origin/main' into opt_metrics_structure
LJH-LBJ 04230c8
fix bug
LJH-LBJ fdfc9b5
fix bug
LJH-LBJ b5d154a
add audio frames
LJH-LBJ 274a784
add audio frames
LJH-LBJ 43a266b
add image image_num and resolution
LJH-LBJ e4ff53e
add image image_num and resolution
LJH-LBJ 13a87f2
add image image_num and resolution
LJH-LBJ bacd480
add audio frames in offline
LJH-LBJ b339c38
add audio frames in offline
LJH-LBJ 935481c
Merge branch 'main' into opt_metrics_structure
LJH-LBJ 2263dd1
fix pre-commit
LJH-LBJ f3b88b1
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ f0bdfaa
fix pre-commit
LJH-LBJ da7a271
change enable_stats to log_stats
LJH-LBJ bcd9ac4
fix bug
LJH-LBJ abf941e
fix pre-commit
LJH-LBJ 03afeaf
delete 0 row
LJH-LBJ 842af89
delete 0 row
LJH-LBJ 56ecac3
fix pre-commit
LJH-LBJ cbdac45
delete 0 row
LJH-LBJ 8ee59ce
delete 0 row
LJH-LBJ 48707f0
opt
LJH-LBJ 9d76475
fix pre-commit
LJH-LBJ 2631578
fix bug
LJH-LBJ e0ce96f
fix pre-commit
LJH-LBJ 04da676
Merge branch 'main' into opt_metrics_structure
LJH-LBJ 26e18b3
Merge branch 'main' into opt_metrics_structure
LJH-LBJ a8bcbc0
fix pre-commit
LJH-LBJ 114a6a3
fix pre-commit
LJH-LBJ 9126e68
Merge branch 'main' into opt_metrics_structure
LJH-LBJ 0b905cf
fix bug
LJH-LBJ 3481dcc
Merge branch 'main' into opt_metrics_structure
hsliuustc0106 7665b29
opt
LJH-LBJ c78d420
opt
LJH-LBJ 2b37f16
opt
LJH-LBJ 78963fb
opt
LJH-LBJ 141d8f8
remove ut in test_async_omni
LJH-LBJ 7c95eb9
fix pre-commit
LJH-LBJ 6687f65
Merge branch 'main' into opt_metrics_structure
LJH-LBJ 68074ac
add test in pipeline.yaml
LJH-LBJ ef34329
fix bug
LJH-LBJ bff608c
Merge branch 'main' into opt_metrics_structure
LJH-LBJ a59c766
fix bug
LJH-LBJ 4918ab1
fix pre-commit
LJH-LBJ 4976551
rerun
LJH-LBJ d646401
Merge branch 'main' into opt_metrics_structure
LJH-LBJ 5efbd55
Merge branch 'main' into opt_metrics_structure
LJH-LBJ e83a338
Merge branch 'main' into opt_metrics_structure
LJH-LBJ 55c11c1
opt test
LJH-LBJ a94349b
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ 6e63657
rerun
LJH-LBJ 9a31bae
rerun
LJH-LBJ 232da73
Merge branch 'main' into opt_metrics_structure
LJH-LBJ 13b0050
rerun
LJH-LBJ db0d866
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ 21de7db
fix bug
LJH-LBJ dd051b2
fix pre-commit
LJH-LBJ dd73daf
Merge branch 'main' into opt_metrics_structure
LJH-LBJ c1c48f9
Merge branch 'main' into opt_metrics_structure
hsliuustc0106 4e6acbe
Merge branch 'main' into opt_metrics_structure
LJH-LBJ d3c6f54
rerun
LJH-LBJ bd6d8cd
Merge branch 'main' into opt_metrics_structure
LJH-LBJ 654073f
Merge branch 'main' into opt_metrics_structure
LJH-LBJ c9068a7
add doc
LJH-LBJ 6626d62
Merge branch 'main' into opt_metrics_structure
LJH-LBJ fe0e4b9
add doc
LJH-LBJ 89f3944
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ 3b311f4
Merge branch 'main' into opt_metrics_structure
LJH-LBJ 4b39808
Merge branch 'main' into opt_metrics_structure
LJH-LBJ 3fff139
fix pre-commit
LJH-LBJ b9c2d46
fix pre-commit
LJH-LBJ 0bb732e
opt
LJH-LBJ 7c91e96
fix pre-commit
LJH-LBJ da335c7
opt
LJH-LBJ 5abc397
opt
LJH-LBJ fb3bacf
fix pre-commit
LJH-LBJ 51f5e0a
fix pre-commit
LJH-LBJ 41db219
opt
LJH-LBJ 3a95be0
fix pre-commit
LJH-LBJ 571f297
Merge branch 'vllm-project:main' into opt_metrics_structure
LJH-LBJ ca2cb26
fix bug
LJH-LBJ 48a519c
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ 1bd59d8
fix bug
LJH-LBJ 24f8bc8
fix bug
LJH-LBJ ef2d5d6
Merge branch 'main' into opt_metrics_structure
LJH-LBJ 23a24ee
fix pre-commit
LJH-LBJ dd5d7b7
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ 41c58d4
fix bug
LJH-LBJ 9145181
Merge branch 'main' into opt_metrics_structure
LJH-LBJ f1195f8
fix ut
LJH-LBJ 4383b01
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ 41482ff
fix ut
LJH-LBJ f07d070
Merge branch 'main' into opt_metrics_structure
LJH-LBJ 764151d
fix ut
LJH-LBJ f1b41d3
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ 382327e
fix dependencies
LJH-LBJ 75be00c
opt stage_wall_time_ms and move metrics.md tp contributing
LJH-LBJ 3ffa4cd
add stage's final_output_type in StageRequestStats
LJH-LBJ e352716
fix bug
LJH-LBJ 7faa2e2
rerun
LJH-LBJ 42f6f0f
update doc
LJH-LBJ e7c502f
opt
LJH-LBJ a71fa64
Merge branch 'main' into opt_metrics_structure
LJH-LBJ fd9d3d4
fix pre-commit
LJH-LBJ 00e7b78
Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…
LJH-LBJ File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,156 @@ | ||
|
|
||
| # Production Metrics: | ||
|
|
||
| You can use these metrics in production to monitor the health and performance of the vLLM-omni system. Typical scenarios include: | ||
| - **Performance Monitoring**: Track throughput (e.g., `e2e_avg_tokens_per_s`), latency (e.g., `e2e_total_ms`), and resource utilization to verify that the system meets expected standards. | ||
| - **Debugging and Troubleshooting**: Use detailed per-request metrics to diagnose issues, such as high transfer times or unexpected token counts. | ||
|
|
||
| ## How to Enable and View Metrics | ||
|
|
||
| ### 1. Start the Service with Metrics Logging | ||
|
|
||
| ```bash | ||
| vllm serve /workspace/models/Qwen3-Omni-30B-A3B-Instruct --omni --port 8014 --log-stats | ||
| ``` | ||
|
|
||
| ### 2. Send a Request | ||
|
|
||
| ```bash | ||
| python openai_chat_completion_client_for_multimodal_generation.py --query-type use_image | ||
| ``` | ||
|
|
||
| ### 3. What You Will See | ||
|
|
||
| With `--log-stats` enabled, the server will output detailed metrics logs after each request. Example output: | ||
|
|
||
|
|
||
| #### Overall Summary | ||
|
|
||
| | Field | Value | | ||
| |-----------------------------|--------------| | ||
| | e2e_requests | 1 | | ||
| | e2e_wall_time_ms | 41,299.190 | | ||
| | e2e_total_tokens | 5,202 | | ||
| | e2e_avg_time_per_request_ms | 41,299.190 | | ||
| | e2e_avg_tokens_per_s | 125.959 | | ||
| | stage_wall_time_ms | 10,192.289, 30,541.409, 207.496 | | ||
|
|
||
| #### RequestE2EStats | ||
|
|
||
| | Field | Value | | ||
| |-------------------------|------------| | ||
| | e2e_total_ms | 41,299.133 | | ||
| | e2e_total_tokens | 5,202 | | ||
| | transfers_total_time_ms | 245.895 | | ||
| | transfers_total_kbytes | 138,089.939| | ||
|
|
||
| #### StageRequestStats | ||
|
|
||
| | Field | 0 | 1 | 2 | | ||
| |------------------------|--------|--------|--------| | ||
| | audio_generated_frames | 0 | 0 | 525,525| | ||
| | batch_id | 38 | 274 | 0 | | ||
| | batch_size | 1 | 1 | 1 | | ||
| | num_tokens_in | 4,860 | 4,826 | 4,384 | | ||
| | num_tokens_out | 67 | 275 | 0 | | ||
| | postprocess_time_ms | 256.158| 0.491 | 0.000 | | ||
| | stage_gen_time_ms | 9,910.007|30,379.198|160.745| | ||
|
|
||
| #### TransferEdgeStats | ||
|
|
||
| | Field | 0->1 | 1->2 | | ||
| |---------------------|-------------|------------| | ||
| | size_kbytes | 109,277.349 | 28,812.591 | | ||
| | tx_time_ms | 78.701 | 18.790 | | ||
| | rx_decode_time_ms | 111.865 | 31.706 | | ||
| | in_flight_time_ms | 2.015 | 2.819 | | ||
|
|
||
|
|
||
| These logs include: | ||
| - **Overall summary**: total requests, wall time, average tokens/sec, etc. | ||
| - **E2E table**: per-request latency and token counts. | ||
| - **Stage table**: per-stage batch and timing details. | ||
| - **Transfer table**: data transfer and timing for each edge. | ||
|
|
||
| You can use these logs to monitor system health, debug performance, and analyze request-level metrics as described above. | ||
|
|
||
| ## Parameter Details | ||
|
|
||
| | Field | Meaning | | ||
| |---------------------------|----------------------------------------------------------------------------------------------| | ||
| | `e2e_requests` | Number of completed requests. | | ||
| | `e2e_wall_time_ms` | Wall-clock time span from run start to last completion, in ms. | | ||
| | `e2e_total_tokens` | Total tokens counted across all completed requests (stage0 input + all stage outputs). | | ||
| | `e2e_avg_time_per_request_ms` | Average wall time per request: `e2e_wall_time_ms / e2e_requests`. | | ||
| | `e2e_avg_tokens_per_s` | Average token throughput over wall time: `e2e_total_tokens * 1000 / e2e_wall_time_ms`. | | ||
| | `stage_wall_time_ms` | Wall-clock time span for each stage, in ms (list format). | | ||
|
|
||
| --- | ||
|
|
||
| ## E2E Table (per request) | ||
|
|
||
| | Field | Meaning | | ||
| |---------------------------|-----------------------------------------------------------------------| | ||
| | `e2e_total_ms` | End-to-end latency in ms. | | ||
| | `e2e_total_tokens` | Total tokens for the request (stage0 input + all stage outputs). | | ||
| | `transfers_total_time_ms` | Sum of transfer edge `total_time_ms` for this request. | | ||
| | `transfers_total_kbytes` | Sum of transfer kbytes for this request. | | ||
|
|
||
|
|
||
| --- | ||
|
|
||
| ## Stage Table (per stage event / request) | ||
|
|
||
| | Field | Meaning | | ||
| |---------------------------|-------------------------------------------------------------------------------------------------| | ||
| | `batch_id` | Batch index. | | ||
| | `batch_size` | Batch size. | | ||
| | `num_tokens_in` | Input tokens to the stage. | | ||
| | `num_tokens_out` | Output tokens from the stage. | | ||
| | `postprocess_time_ms` | Postprocessing time in ms. | | ||
| | `stage_gen_time_ms` | Stage compute time in ms, excluding postprocessing time (reported separately as `postprocess_time_ms`). | | ||
| | `image_num` | Number of images generated (for diffusion/image stages). | | ||
| | `resolution` | Image resolution (for diffusion/image stages). | | ||
| | `postprocess_time_ms` | Diffusion/image: post-processing time in ms. | | ||
| | `trajectory_timesteps` | Diffusion/image: trajectory timesteps, if available. | | ||
|
|
||
| --- | ||
|
|
||
| ## Transfer Table (per edge / request) | ||
|
|
||
| | Field | Meaning | | ||
| |----------------------|---------------------------------------------------------------------------| | ||
| | `size_kbytes` | Total kbytes transferred. | | ||
| | `tx_time_ms` | Sender transfer time in ms. | | ||
| | `rx_decode_time_ms` | Receiver decode time in ms. | | ||
| | `in_flight_time_ms` | In-flight time in ms. | | ||
|
|
||
|
|
||
| ## Expectation of the Numbers (Verification) | ||
|
|
||
| **Formulas:** | ||
| - `e2e_total_tokens = Stage0's num_tokens_in + sum(all stages' num_tokens_out)` | ||
| - `transfers_total_time_ms = sum(tx_time_ms + rx_decode_time_ms + in_flight_time_ms)` for every edge | ||
|
|
||
| **Using the example above:** | ||
|
|
||
| ### e2e_total_tokens | ||
| - Stage0's `num_tokens_in`: **4,860** | ||
| - Stage0's `num_tokens_out`: **67** | ||
| - Stage1's `num_tokens_out`: **275** | ||
| - Stage2's `num_tokens_out`: **0** | ||
|
|
||
| So, | ||
| ``` | ||
| e2e_total_tokens = 4,860 + 67 + 275 + 0 = 5,202 | ||
| ``` | ||
| This matches the table value: `e2e_total_tokens = 5,202`. | ||
|
|
||
| ### transfers_total_time_ms | ||
| For each edge: | ||
| - 0->1: tx_time_ms (**78.701**) + rx_decode_time_ms (**111.865**) + in_flight_time_ms (**2.015**) = **192.581** | ||
| - 1->2: tx_time_ms (**18.790**) + rx_decode_time_ms (**31.706**) + in_flight_time_ms (**2.819**) = **53.315** | ||
|
|
||
| Sum: 192.581 + 53.315 = **245.896** | ||
|
|
||
| The table shows `transfers_total_time_ms = 245.895`, which matches the calculation (difference is due to rounding). | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please take a final check on the md and test the output data again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already check .md file and update the test result