`--rate-type concurrent` CLI parameter is implemented #79

parfeniukink · 2025-02-27T11:53:54Z

Execution example

(py38) ➜  guidellm git:(parfeniukink/concurrent-load-generation-v2) python src/guidellm/main.py --target http://localhost:8080/v1 --model Phi-3-mini-4k-instruct-q4.gguf --data 'prompt_tokens=128,generated_tokens=128' --data-type emulated --tokenizer "hf-internal-testing/llama-tokenizer" --max-requests 2 --rate-type concurrent --rate 2
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
╭─ Benchmarks ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [14:08:32]   100% concurrent   (0.12 req/sec avg)                                                                                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
  Generating report... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ (1/1) [ 0:00:16 < 0:00:00 ]
╭─ GuideLLM Benchmarks Report (stdout) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ╭─ Benchmark Report 1 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │
│ │ Backend(type=openai_server, target=http://localhost:8080/v1, model=Phi-3-mini-4k-instruct-q4.gguf)                                                                                   │ │
│ │ Data(type=emulated, source=prompt_tokens=128,generated_tokens=128, tokenizer=hf-internal-testing/llama-tokenizer)                                                                    │ │
│ │ Rate(type=concurrent, rate=(2.0,))                                                                                                                                                   │ │
│ │ Limits(max_number=2 requests, max_duration=120 sec)                                                                                                                                  │ │
│ │                                                                                                                                                                                      │ │
│ │                                                                                                                                                                                      │ │
│ │ Requests Data by Benchmark                                                                                                                                                           │ │
│ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓                                                                              │ │
│ │ ┃ Benchmark                 ┃ Requests Completed ┃ Request Failed ┃ Duration  ┃ Start Time ┃ End Time ┃                                                                              │ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩                                                                              │ │
│ │ │ [email protected] req/sec │ 4/4                │ 0/4            │ 32.05 sec │ 14:08:32   │ 14:09:04 │                                                                              │ │
│ │ └───────────────────────────┴────────────────────┴────────────────┴───────────┴────────────┴──────────┘                                                                              │ │
│ │                                                                                                                                                                                      │ │
│ │ Tokens Data by Benchmark                                                                                                                                                             │ │
│ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓                                                              │ │
│ │ ┃ Benchmark                 ┃ Prompt ┃ Prompt (1%, 5%, 50%, 95%, 99%)    ┃ Output ┃ Output (1%, 5%, 50%, 95%, 99%)    ┃                                                              │ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩                                                              │ │
│ │ │ [email protected] req/sec │ 129.00 │ 129.0, 129.0, 129.0, 129.0, 129.0 │ 128.00 │ 128.0, 128.0, 128.0, 128.0, 128.0 │                                                              │ │
│ │ └───────────────────────────┴────────┴───────────────────────────────────┴────────┴───────────────────────────────────┘                                                              │ │
│ │                                                                                                                                                                                      │ │
│ │ Performance Stats by Benchmark                                                                                                                                                       │ │
│ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │ │
│ │ ┃                           ┃ Request Latency [1%, 5%, 10%, 50%, 90%, 95%,    ┃ Time to First Token [1%, 5%, 10%, 50%, 90%,     ┃ Inter Token Latency [1%, 5%, 10%, 50%, 90% 95%,  ┃ │ │
│ │ ┃ Benchmark                 ┃ 99%] (sec)                                      ┃ 95%, 99%] (ms)                                  ┃ 99%] (ms)                                        ┃ │ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ │
│ │ │ [email protected] req/sec │ 8.50, 9.46, 10.65, 20.20, 29.68, 30.85, 31.80   │ 1365.0, 2316.8, 3506.6, 13039.3, 22588.4,       │ 50.8, 51.4, 51.7, 54.7, 62.3, 68.1, 73.6         │ │ │
│ │ │                           │                                                 │ 23781.6, 24736.2                                │                                                  │ │ │
│ │ └───────────────────────────┴─────────────────────────────────────────────────┴─────────────────────────────────────────────────┴──────────────────────────────────────────────────┘ │ │
│ │                                                                                                                                                                                      │ │
│ │ Performance Summary by Benchmark                                                                                                                                                     │ │
│ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓                                          │ │
│ │ ┃ Benchmark                 ┃ Requests per Second ┃ Request Latency ┃ Time to First Token ┃ Inter Token Latency ┃ Output Token Throughput ┃                                          │ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩                                          │ │
│ │ │ [email protected] req/sec │ 0.12 req/sec        │ 20.17 sec       │ 13045.15 ms         │ 56.12 ms            │ 15.98 tokens/sec        │                                          │ │
│ │ └───────────────────────────┴─────────────────────┴─────────────────┴─────────────────────┴─────────────────────┴─────────────────────────┘                                          │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

`tox` local report

for some reason GitHub has some issues with quality checks. Here is report from local:

use ``--rate`` CLI parameter to specify concurrent workers number

markurtz · 2025-03-10T18:10:33Z

Closing this out as it is being reworked and included in #96

…ation Refactor (#96) Full refactor of GuideLLM enabling better overall performance to ensure minimal overhead for benchmarking with a new multiprocess and threaded scheduler along with significant updates to the output formats enabling better analysis, visibility, and clarity. <img width="668" alt="Screenshot 2025-04-11 at 2 26 13 PM" src="https://github.com/user-attachments/assets/a723854a-7fe0-4eb2-9408-f632e747c3c2" /> Fixes: - #92 - #77 - #47 - #79 --------- Co-authored-by: Alexandre Marques <[email protected]> Co-authored-by: Samuel Monson <[email protected]> Co-authored-by: David Gray <[email protected]>

--rate-type concurrent CLI is implemented

4217758

use ``--rate`` CLI parameter to specify concurrent workers number

parfeniukink requested a review from markurtz February 27, 2025 11:53

parfeniukink self-assigned this Feb 27, 2025

rgreenberg1 added the load-request load-request workstream label Feb 28, 2025

rgreenberg1 added this to GuideLLM Kanban Board Feb 28, 2025

rgreenberg1 moved this to In progress in GuideLLM Kanban Board Feb 28, 2025

rgreenberg1 added this to the GuideLLM v0.2.0 - CI/CD Finalization, Documentation Expansion, and Backend Support milestone Feb 28, 2025

markurtz mentioned this pull request Mar 10, 2025

Multi Process Scheduler Implementation, Benchmarker, and Report Generation Refactor #96

Merged

markurtz closed this Mar 10, 2025

github-project-automation bot moved this from In progress to Done in GuideLLM Kanban Board Mar 10, 2025

markurtz deleted the parfeniukink/concurrent-load-generation-v2 branch April 21, 2025 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`--rate-type concurrent` CLI parameter is implemented #79

`--rate-type concurrent` CLI parameter is implemented #79

parfeniukink commented Feb 27, 2025 •

edited

Loading

markurtz commented Mar 10, 2025

--rate-type concurrent CLI parameter is implemented #79

--rate-type concurrent CLI parameter is implemented #79

Conversation

parfeniukink commented Feb 27, 2025 • edited Loading

Execution example

tox local report

markurtz commented Mar 10, 2025

`--rate-type concurrent` CLI parameter is implemented #79

`--rate-type concurrent` CLI parameter is implemented #79

parfeniukink commented Feb 27, 2025 •

edited

Loading

`tox` local report